A Majority of Data Scientists Lack Competency in Advanced Machine Learning Areas and Techniques


Share on LinkedIn

Data science requires the effective application of skills in a variety of machine learning areas and techniques. A recent survey by Kaggle, however, revealed that a limited number of data professionals possess competency in advanced machine learning skills. About half of data professionals said they were competent in supervised machine learning (49%) and logistic regression (53%). Deep learning techniques were among the ML skills with the lowest competency rates: Neural Networks – GAN (7%); NN – RNNs (15%) and NN – CNNs (26%).

A majority of enterprises (80%) have some form of artificial intelligence (machine learning, deep learning) in production today. Additionally, about a third of enterprises are planning on expanding their AI efforts over the next 36 months. But who will lead these data science projects? Who will do the work? Some researchers suggest, however, there is a lack of AI talent needed to fill those roles. Tencent estimates there are only 300,000 AI researchers and practitioners worldwide. ElementAI estimates there are 22,000 PhD-level researchers working in AI.

Kaggle conducted a survey in August 2017 of over 16,000 data professionals (2017 State of Data Science and Machine Learning). The survey asked respondents about their competence across a variety of AI-related approaches and techniques. Looking at different AI skills will give us a more detailed look into the specific AI skills that are driving this talent gap.

Competency in Machine Learning Areas

Figure 1. Competency in Machine Learning Areas. Click image to enlarge.

All respondents (employed or not) were were given a list of 13 machine learning areas and asked to indicate in which areas they consider themselves competent. The top 10 machine learning areas in which data professionals are competent were (see Figure 1):

  1. Supervised Machine Learning (49%)
  2. Unsupervised Learning (26%)
  3. Time Series (25%)
  4. Natural Language Processing (19%)
  5. Outlier detection (16%)
  6. Computer Vision (15%)
  7. Recommendation Engines (14%)
  8. Survival Analysis (8%)
  9. Reinforcement Learning (6%)
  10. Adversarial Learning (4%)

On average, the respondents said they were competent in two (2 – median) machine learning areas. Competency rates varied by the job title. Competency rates were higher for data professionals who self-identified as “data scientists” and “machine learning engineers,” each saying they were competent in three (3) ML areas. Competency rates were lower for data professionals who self-identified as “business analyst,” “engineers,” and “programmers,” each saying they were competent in one (1) ML area.

Competency in Machine Learning Techniques

Figure 2. Competency in Machine Learning Techniques. Click image to enlarge.

The survey included a question for all data professionals, employed or not, regarding their competency in 13 machine learning techniques (In which areas of machine learning do you consider yourself competent? (Select all that apply).) The top 10 machine learning techniques in which data pros are competent were (see Figure 2):

  1. Logistic Regression (54%)
  2. Decision Trees – Random Forests (43%)
  3. Support Vector Machines (32%)
  4. Decision Trees – Gradient Boosted Machines (31%)
  5. Bayesian Techniques (27%)
  6. Neural Networks – CNNs (26%)
  7. Ensemble Methods (22%)
  8. Gradient Boosting (17%)
  9. Neural Networks – RNNs (15%)
  10. Hidden Markov Models HMMs (9%)

Data professionals said they were competent in two (2 – median) machine learning areas. Competency rates varied by job title. Self-identified data scientists said they were competent in five (5) ML techniques, while data professionals who identified as machine learning engineers or predictive modelers said they were competent in four (4) ML techniques. Competency rates were lower for database engineers and programmers, each possessing competency in one (1) ML technique.


Even though a few ML areas and techniques are known by about half of the data professionals (i.e., supervised machine learning, logistic regression, decision trees), there are many other ML areas and techniques in which there is a paucity of talent. A majority of data professionals lack competency in many advanced machine learning areas and techniques like neural networks, evolutionary techniques, reinforcement learning and adversarial learning.

If you’re an aspiring or practicing data professional, you need a good foundation of the basic skills, including supervised machine learning, logistic regression and decision trees. These types of skills apply to a wide variety of data science problems that focus on identifying predictors of important organizational outcomes.

If you want to set yourself apart from your data peers, you might consider becoming competent in rare AI skills like NLP, reinforcement learning, adversarial learning, recommendation engines and neural networks. In fact, 4 out of 10 data professionals said they are most excited to learn about deep learning this year. The high demand of AI talent coupled with the limited talent supply could result in extremely high salaries for those willing to learn these AI skills.

Republished with author's permission from original post.


Please use comments to add value to the discussion. Maximum one link to an educational blog post or article. We will NOT PUBLISH brief comments like "good post," comments that mainly promote links, or comments with links to companies, products, or services.

Please enter your comment!
Please enter your name here