This is the last article of the series Decoding Machine Learning.
Managing machine learning projects requires careful planning, well thought strategy, and effective use of resources to ensure that they deliver desired results within stipulated timelines. At first it looks like a difficult cookie to crack, understandably so. The stakes are high and as data science often consists of conducting several experiments and tests to get to the right answer, it almost seems unpredictable.
This article covers ways that will assist you in navigating this hurdle.
Elements
Let’s consider a brief overview of the major elements that are the prerequisites of building a plan. These elements have also been covered in more detail in the previous articles.
- Understanding the scope: Before beginning any machine learning project, it’s important to have a clear understanding of the scope and goals of the project. This involves defining the business problem the project aims to solve and identifying the specific objectives (business and technical specifications) that must be achieved to solve it successfully.
- Data Complexity: The assessment of data requirements is a very important step. It will help define the constraints of your approach. Another key part is understanding the data quality which involves evaluating the completeness, accuracy, and consistency of available data to determine if they are suitable for the project. Further, identifying the potential limitations of data, such as missing values, non-uniformity, or possible biases that may also impact the project’s performance.
- Approach: The strategy or general approach to the problem statement is critical. We have covered this in much more detail in the previous article.
- Plan: This portion will be discussed in detail in this article covering resourcing to building the final plan for executing the project.
Plan
How do we plan? At this stage you should have an understanding of the client requirements, a proposed approach and a comprehensive list of the tasks. Let us focus our attention on two major concepts and practices.
Using Work Breakdown Structure (WBS)
Understand with an example
It is important to simply breakdown a task into subtasks. Let’s take an example:
- Client requirement is to classify red / green light at traffic stops. While the client has a model in place, they realised that there were several points of failure. Particularly, certain types of stop lights were barely seen in the training dataset causing failure in production.
- The client wants you to evaluate their training data to identify more such issues and also provide more data to help cover the missed out cases.
Now, let’s consider the overall tasks:
- Client data assessment: Evaluating the images by clustering them basis similarity, metadata and other characteristics.
- Research & defining the universe: Understanding the different ways stop lights are shown in different countries.
- Sourcing: Implement a mechanism to source more data
- Quality evaluation: Calculating KPI’s for evaluating the sourced data
- Delivery: Delivering the data along with the presentations and analysis.
If we were to break down the tasks into smaller components:
Task | Sub-Tasks |
Client data assessment | Ingesting the data into local / cloud storage |
Client data assessment | Building code to generate meta data features such as height, width of images |
Client data assessment | Building code to cluster similar images using image embeddings |
Client data assessment | Manually review clusters and summarise meta data statistics and clusters |
Client data assessment | Identify and document gaps in the data |
Research & defining the universe | Research for the type of stop lights by country |
Research & defining the universe | Identify sources to scrape / fetch the data |
Sourcing | Write code to crawl web pages containing animal pictures or gather URLs provided manually |
Sourcing | Evaluate the coverage / sufficiency and quality of the extract |
Sourcing | Downloading additional data points manually where scraping is not possible |
Quality Evaluation | Identifying KPI’s to assess the coverage, completeness and data quality |
Quality Evaluation | Build analysis report |
Delivery | Finalise the data for delivery |
Delivery | Finalise the reports / presentation of the analysis covering major statistics and insights |
Definition
Officially, a work breakdown structure (WBS) is a hierarchical decomposition of tasks required to complete a project. But that is not the only requirement of WBS. While developing a comprehensive WBS, we also need to categorise responsibilities based on scope, complexity, and resource requirements. Further, we assign subtasks to individuals, establish clear dependencies among tasks, and outline milestones that signify completion of important phases. It does feel like an extra effort but having a detailed WBS ensures nothing falls through the cracks and keeps team members focused on their respective roles and accountabilities.
Work estimation
Let’s consider that during that the client has asked you to complete this activity within 7 days. Also, in a few sources identified during your preliminary research, it will be easier to download the data than building scrapers.
Thus, in order to estimate the effort involved for each component task, you need to understand the complexity. This is usually determined based on the past experience, skill set and research.
For example, if there is a relatively less complicated source that needs to be scraped and the intended resource to be allocated has prior experience then within a day, the script can be made and tested. It is also imported to build some buffer in your effort estimations. Generally, if there is a high complexity task then you can add 30% – 50% time to your buffer.
This means we are classifying each task into high, medium or low based on the complexity.
Complexity | Description |
High | Tasks with a lot of unknowns or known difficult challenges. These challenges poses the potentiality of altering your approach to a problem. |
Medium | Tasks where you have a general idea or approach in mind or you have had past experience implementing something similar. |
Low | Generally, an easy task where the requirements and approach is well known. |
Similarly, the potential impact / value of the outcome can also be classified into categories (High, Medium, Low).
Impact | Description |
High | Highly critical task delivering high value to the end user. If this task is not carried out then it will not meet the project requirements |
Medium | Tasks where you have a general idea or approach in mind or you have had past experience implementing something similar. |
Low | Generally, an easy task where the requirements and approach is well known. |
Value vs Complexity Matrix
At its core, the complexity vs. value matrix assigns relative values to tasks or features within a larger project context, allowing teams to make informed decisions on how to divide efforts and resources. Consider the example covered in this article:
Task | Sub-Tasks | Complexity | Value |
Client data assessment | Ingesting the data into local / cloud storage | Low | Low |
Client data assessment | Building code to generate meta data features such as height, width of images | Medium | Medium |
Client data assessment | Building code to cluster similar images using image embeddings | High | High |
Client data assessment | Manually review clusters and summarise meta data statistics and clusters | Medium | High |
Client data assessment | Identify and document gaps in the data | Low | High |
Research & defining the universe | Research for the type of stop lights by country | Medium | High |
Research & defining the universe | Identify sources to scrape / fetch the data | Low | Medium |
Sourcing | Write code to crawl web pages containing animal pictures or gather URLs provided manually | Medium | Medium |
Sourcing | Evaluate the coverage / sufficiency and quality of the extract | Low | Medium |
Sourcing | Downloading additional data points manually where scraping is not possible | Medium | Medium |
Quality Evaluation | Identifying KPI’s to assess the coverage, completeness and data quality | Medium | High |
Quality Evaluation | Build analysis report | Low | High |
Delivery | Finalise the data for delivery | Low | High |
Delivery | Finalise the reports / presentation of the analysis covering major statistics and insights | Low | High |
Let’s decode the matrix.
High value, Low complexity: Tasks falling into this category may not require a lot of effort, but they hold significant weight in achieving project goals. Addressing these tasks early on sets a foundation for continued progress while yielding substantial benefits.
Low value, Low complexity: Routine jobs without much significance. While essential for meeting project objectives, they often consume unnecessary effort unless managed carefully. Avoid investing too heavily in these trivial matters that will not significantly contribute to the project’s outcome.
High value, High complexity: With elevated stakes come complicated processes demanding intense focus and specialised skills. These projects present both opportunities and challenges, and managing them well requires experienced leadership, teamwork, and clear communication channels.
Similarly, you can consider the rest of the combinations as well.
Finally, we come close to putting our plan in shape. The only component left is deciding the tooling (which can be decided based on the requirements) and allocating the tasks to the respective resources.
About Us
Data Science Discovery is a step on the path of your data science journey. Please follow us on LinkedIn to stay updated.
About the writers:
- Ujjayant Sinha: Data scientist with professional experience in market research and machine learning in the pharma domain across computer vision and natural language processing.
- Ankit Gadi: Driven by a knack and passion for data science coupled with a strong foundation in Operations Research and Statistics has helped me embark on my data science journey.