Machine learning (ML) algorithms are changing nearly every industry. They’re increasing productivity, boosting sales and helping us make more informed decisions. Many organizations are either already leveraging the power of ML, or have it laid out in their roadmap as an opportunity worth pursuing.
Trouble is, ML is complex and you might ask yourself what you need to consider when starting an ML project. Asking that question is absolutely right! Before you kick off the ML initiative, you should take some time to plan it out. This will go a long way toward minimizing potential risks and maximizing the positive results.
That’s why we’ve put together 11 questions to ask before starting a successful machine learning project. They consider your strategy, culture, resources, and data. The questions will help you build the strategic roadmap for your ML project and bring you one step further in its implementation.
Disclaimer: The questions apply to companies where machine learning is not the core business.
1. What are your organization’s business goals?
Even if ML is started as an experiment, there should always be an end business goal that actually matters and which can be impacted by ML. The success criteria of an ML project should not be a certain % accuracy, but a business metric.
For example, imagine you’re in the retail space and your main focus now is to reduce costs. A specific business goal could be to reduce warehouse costs by 10% at the end of H2. One way to fulfill this could be via optimizations derived from ML algorithms.
2. Should ML reduce costs or increase revenue?
A successful ML project either reduces costs or increases revenue (or both) and you should define, which goal it is. Only this way will ML have a major impact on your organization and its growth.
Examples of reducing costs with machine learning are virtual assistants and chatbots, predictive maintenance and inventory optimization. In contrast, solutions like price optimization, behavioral customer tracking via video analytics or recommender systems are implemented to increase revenue.
3. Which is your clear and realistic way of measuring the success of your ML initiative?
Each ML project is different and you need to define a success metric that makes sense for your specific project and which you’ll be able to measure. Once that metric is set, you should make sure it is accepted by business people and data scientists alike.
Possible success metrics of ML initiatives are working hours spent, customer inquiries response time or the % accuracy.
4. How does your organization handle the risk?
As with all innovations, there is a chance that the results of the ML initiative do not turn out as expected because of several factors. Maybe you discover that you need 10x the data to do something meaningful, or that the data was simply not good enough. This is why you should take this risk of an ML initiative failing into consideration, and be able to pivot to another one in case the project does not result in what you expected.
5. How do you acquire the right talent?
ML expertise is challenging to acquire in the market due to high demand for it. Fully or partly outsourcing your ML project is a good alternative when you can’t get the right (or not enough) people to work on the project, especially when timing is important and you want to implement the solution sooner rather than later.
6. Do you have a clear high-level understanding of what ML is?
Having an understanding of different use cases, what input ML algorithms need, and when they can’t be applied helps you to take the right decisions in the project. You should conceptually understand what ML brings to the picture, for example by reading machine learning blogs and case studies, attending conferences or talking to experts.
7. Is access to information guaranteed?
When starting ML initiatives, data scientists need to have easy access to information. They will need to work together with key people, possibly from different departments, such as the IT department for example. Those people need to support the project with business knowledge of some sorts and internal bureaucracy would be a main constraint. Ideally and in time, ML and data science should be transversal to the whole organization. “Data science is the new accounting”.
8. Have you planned the initiative as a mid-term project?
Results from ML initiatives take time and you’ll not be able to prove success over night. This requires your organization to regard the machine learning initiative as a mid-term project. In the software development world, time estimations have always been a challenge. This challenge is even harder in the case of ML development because the process itself has more uncertainty. Sometimes, you can be stagnated and a simple breakthrough can completely turn things around. That said, you should be aware that it can take several months of work until you get the desired results.
9. Is your organization collecting the right data?
Machine learning algorithms are no magic. They need data to work, and they can only be as good as the data you feed in. There are different methods and sources to collect the right data, depending on your objectives. Anyhow, the more input data you have, the better the chances that your ML model will perform well. If you have doubts about the quantity and quality of your data, you can ask data scientists to help you evaluate the datasets and find the best way to get to third-party data, if necessary.
10. Is your organization collecting the data in the right format?
Besides having the right amount and type of data, but you should also make sure you’re collecting data in the right format. Imagine you have taken thousands of perfect pictures of smartphones (good resolution and white background) in order to train a computer vision model to detect them in images. Then you discover that it won’t work because the actual use case was detecting people holding smartphones in various lighting/contrasts/backgrounds, and not the smartphones by themselves. Your past data collection effort would be nearly worthless, and you will need to start over. Also, you should understand if bias exists in the data being collected because ML algorithms will learn that bias.
11. Have you taken human labeling of data into consideration?
Depending on the project, the ML solution can be based on supervised algorithms. These algorithms require the collected data to be labeled, ie. a human would need to specify what the expected outcome is for each example that we have collected, so the algorithm can learn from these insights. Make sure you include the costs for people building such a dataset in your project budget.