A recent survey revealed that 84% of data pros have used at least one ML framework in the last 5 years while 51% of data pros have used at least one ML product in the last 5 years. The most popular ML frameworks include Scikit-Learn, Tensorflow and Keras. The most popular ML products include SAS, Cloudera and Azure.
The practice of data science requires the use of machine learning products and frameworks to help data professionals automate processes that drive their business forward. To better understand the adoption of these tools, I examined data from a recent worldwide study by Kaggle. This survey study of 23,859 data professionals was conducted in October of 2018 (2018 Machine Learning and Data Science Survey). This survey included a variety of questions about data science, machine learning, education and more. Kaggle released the raw survey data and many of their members have analyzed the data (see link above).
Most Popular Machine Learning Frameworks
The survey included a question, “What machine learning frameworks have you used in the past 5 years? (Select all that apply).” Fifteen percent of data professionals surveyed said they did not use any ML frameworks in the past 5 years. On average, data professionals have used 3 (median) machine learning frameworks in the past 5 years. As seen in Figure 1, the top ML frameworks are Scikit-Learn (66% of respondents said they used this framework), followed by TensorFlow (53%) and Keras (44%). The top 10 ML frameworks are rounded out by randomForest, Xgboost, PyTorch, Caret, lightgbm, Spark MLlib and H2O.
Of the data professionals who identified as a data scientist as their job title, 85% used Scikit-Learn, 65% used TensorFlow and 60% used Keras.
Next, the survey also asked the respondents, “Of the choices that you selected in the previous question, which ML library have you used the most?” As seen in Figure 2, a little under half (46%) of data professionals used Scikit-Learn most often. The remaining libraries are much less popular, with only 15% of data pros saying they use TensorFlow and 13% saying they use Keras.
Adoption of TensorFlow increased substantially from 22% in 2017 to 54% in the current survey period. This increase was expected as 24% of data pros in 2017 said that they were most interested in learning TensorFlow in 2018.
Most Popular Machine Learning Products
Next, I looked at adoption rates of ML products. The survey asked respondents, “Which of the following machine learning products have you used at work or school in the last 5 years (see Figure 3). Results showed that only about 50% of data professionals used at least one ML product in the past 5 years. On average, data pros used only one (median) ML product in the past 5 years.
Adoption rates for ML products are lower than adoption rates for ML frameworks. The most popular ML products were SAS (7.9%), Cloudera (7.4%) and Azure Machine Learning Studio (7.0%). The top 15 products also included, in descending order, Google Cloud Machine Learning Engine, Google Cloud Vision API, RapidMiner, Google Cloud Speech-to-text API, Google Cloud Natural Language API, IBM Watson Machine Learning, Amazon SageMaker, IBM Watson Studio, Azure Machine Learning Workbench (now deprecated), Google Cloud AutoML and Azure Cognitive Services.
When looking at data professionals who had the job title of data scientist, nearly 60% of them have used an ML product in the past 5 years, with the top 3 adopted products again being SAS (10%), Cloudera (9.8%) and Azure Machine Learning Studio (9.4%).
Data professionals tend to use ML frameworks at a significantly higher rate than ML products. The open-source nature of many of the ML frameworks likely make them more attractive than the ML products. While 85% of data pros have used at least one ML framework, only about half of data pros have used an ML product.
The most popular ML framework, by far, was Scikit-Learn, followed by TensorFlow and Keras. The most popular ML product was SAS, followed by Cloudera and Azure Machine Learning Studio. The large technology vendors are represented in the top 15 ML products, including Google, Microsoft, Amazon and IBM.
I will be attending the annual IBM Think 2019 technology conference next month (Feb 12-15) in San Francisco to learn about what they are doing in the area of analytics. I am particularly looking forward to learning more about IBM Watson Studio to determine how I can use it to augment the analysis underlying my new customer survey methodology. If you are going to attend the event, please let me know if you’d like to meet up for a drink and conversation.