Who Does the Machine Learning and Data Science Work?


Share on LinkedIn

A survey of over 19,000 data professionals showed that nearly 2/3rds of respondents said they analyze data to influence product/business decisions. Only 1/4 of respondents said they do research to advance the state of the art of machine learning. Different data roles have different work activity profiles with Data Scientists engaging in more different work activities than other data professionals.

We know that data professionals, when working on data science and machine learning projects, spend their time on a variety of different activities (e.g., gathering data, analyzing data, communicating to stakeholders) to complete those projects. Today’s post will focus on the broad work activities (or projects) that make up their roles at work, including “Build prototypes to explore applying machine learning to new areas” and “Analyze and understand data to influence product or business decisions”. Toward that end, I will use the data from the recent Kaggle survey of over 19,000 data professionals in which respondents were asked a variety of questions about their analytics practices, including their job title, work experience and the tools and products they use.

Top Work Activities (Projects) that Make up Data Professionals’ Roles

The survey respondents were asked to “Select any activities that make up an important part of your role at work: (Select all that apply).” On average respondents indicated that two (median) of the activities make up on important part of their role. The entire list of activities (shown in Figure 1) were:

  1. Analyze and understand data to influence product or business decisions (63%)
  2. Build prototypes to explore applying machine learning to new areas (52%)
  3. Experimentation and iteration to improve existing ML models (39%)
  4. Build and/or run the data infrastructure that my business uses for storing, analyzing, and operationalizing data (37%)
  5. Build and/or run a machine learning service that operationally improves my product or workflows (35%)
  6. Do research that advances the state of the art of machine learning (25%)
  7. None of these activities are an important part of my role at work (6%)
  8. Other (3%)

Figure 1. Activities that Make Up Important Parts of Data Professionals’ Role

The The top work activity was somewhat practical in nature, helping the company improve how it runs the business: analyzing data to influence products and decisions. The work activity with the lowest endorsement was more theoretical in nature: doing research that advances the state of the art of machine learning.

Work Activities by Different Data Roles

Next, I examined if there were differences across different data roles (as indicated by respondents’ job title) with respect to work activities. I looked at 5 different job title for this analysis. The results revealed a couple of interesting findings (See Figure 2):

First, respondents who self-identified as Data Scientists, on average, indicated that they are involved in 3 (median) activities at work compared to the other respondents who are involved in 2 job activities.

Second, we see that the profile of work activities varies greatly across different data roles. While many of the respondents indicated that analysis and understanding of data to influence products/decisions was the top activity for them, a top activity for Research Scientists was doing research that advances the state of the art of machine learning. Additionally, the top activity for Data Engineers was building and/or running the data infrastructure.

Figure 2. Typical work activities vary across different data roles.

Summary and Conclusions

The top work activity for data professional roles appears to be very practical and necessary to run day-to-day business operations. These top work activities included influencing business decisions, building prototypes to expand machine learning to new areas and improving ML models. The bottom activity was more about long-term understanding of machine learning reflected in conducting research to advance the state of the art of machine learning.

Different data roles possess different activity profiles. Top work activities tend to be associated with the skill sets of different data roles. Building/Running data infrastructure was the top activity for Data Engineers; doing research to advance the field of machine learning was a top activity for Research Scientists. These results are not surprising as we know that different data professionals have different skill sets. In prior research, I found that data professionals who self-identified as Researchers have a strong math/statistics/research skill set. Developers, on the other hand, have strong programming/technology skills. And data professionals who were Domain Experts have strong business-domain knowledge. Data science and machine learning work really is a team sport. Getting data teams with members who have complementary skill sets will likely improve the success rate of data science projects.

Remember that data professionals have their unique skill set that makes them a better fit for some data roles than others. When applying for data-related positions, it might be useful to look at the type of work activities for which you have experience (or are competent) and apply for the positions with corresponding job titles. For example, if you are proficient in running a data infrastructure, you might consider focusing on Data Engineer jobs. If you have a strong skill set related to research and statistics, you might be more likely to get a call back when applying for Research Scientist positions.


Please enter your comment!
Please enter your name here