I recently made a Batman analogy when discussing the topic of data science with some colleagues. In this post, I will explore this analogy further.
Last week, a group of data professionals, including Jennifer Shin, Dion Hinchcliffe, Joe McKendrick, Joe Caserta and me, sat down with theCube‘s Dave Vellante and John Walls for a panel discussion on the topic of data science in support of IBM’s 75-minute Data Science for All broadcast. Two excellent summaries of our discussion appear on Forbes and siliconANGLE. During the discussion, I made a Batman analogy with respect to the practice of data science. I’d like to explore that idea more here.
The Utility (Belt) of Data Science Tools and Skills
Despite what you might read about data science professionals being superhuman creatures who work magic on data, they’re not. They are mere mortals. I like to think that data science professionals are more like Batman than Superman. Superman is not of this planet and possesses supernatural powers. Batman, like data science professionals, does not possess magical powers; he can’t fly on his own, he can’t be invisible on his own. Instead, what transforms Bruce Wayne into the superhero Batman (besides the leotards) are the tools he has in his utility belt, each one giving him the power to get out of different predicaments.
Similarly, data professionals bring many different data science tools and skills to bear on data problems with which they are confronted. The more data science tools they have available to them, the more likely they are to handle data problems. While Batman has access to over 25 tools in his utility belt (e.g., batarangs, lock pick, cryptographic sequencer and even kryptonite) to help him get out of his various life-threatening situations, data professionals possess many different data science skills that help them handle Big Data problems. We studied 25 data science skills that make up the meat of data science (see Figure 1).
Batmen and Data Science Roles
There are many different types of Batman characters. We have seen the campy Batman, the gothic Batman and even the serious Batman. In the field of data science, we also find that not all data professionals are created equal.
We found that there are three broad types of data professionals who practice data science: Researchers, Developers and Domain Experts. Each of these types of data professionals are defined by the skills and tools they possess (see Figure 3). Data professionals who call themselves Researchers are strong in math and stats and weak in other areas. Data pros who call themselves Developers are stronger in technology and programming but less proficient in statistics/math and domain knowledge. Domain Experts are stronger in domain-specific knowledge than they are in statistics or technology/programming.
So, while each of these different data actors might call themselves “data scientists,” they really are quite different from each other and possesses a specific set of complementary data science skills. In fact, to effectively practice data science, you will need to have these three types of data professionals working together to solve problems. Domain experts ask the right questions. Developers help get access to the data and Researchers analyze the data to answer the questions.
Holy Training and Education, Batman!
Batman’s utility belt is not solely responsible for his crimefighting prowess. He received years of training to gain different skills he needed to make use of the tools in his utility belt. Similarly, data professionals will require some form of training to gain the skills necessary to effectively practice data science.
In our study of data scientists, we found that over half of them hold either a Masters or PhD degree and about a quarter of them hold a 4-year degree. We also found that the level of educational attainment is related to proficiency in data science skills (more advanced degrees are associated with greater proficiency), but only for specific types of data scientists. In general, research data scientists who hold a PhD are more proficient in Statistics and Math & Modeling compared to their counterparts who hold a Masters or 4-year college degree. Also, business management data scientists who hold a Masters degree were more proficient in Business, Math & Modeling and Statistics than their peers who hold a 4-year degree. We did not, however, see any skill difference between Developers who held a Masters or those who held a 4-year degree.
Batgirl and Gender Equality/Diversity
Batgirl represents the women data professionals.Like Batman, Batgirl also possessed a utility belt – and utility pocket book – that contains useful tools to fight crime. That is, she was just as effective as Batman at using those tools to solve problems. Let’s not forget about the power of women in the data science field. Our research showed that women possess similar levels of proficiency in data science skills as their male counterparts. We also found that women are more likely to practice data science as researchers (63% of female data professionals are researchers) than their male counterparts (36% of male professionals are researchers).
Even though women possess comparable data science skills to men, women data professionals make less than their male counterparts. Batgirl has something to say about that.
Data Science for All? Precisely!
Batman is as powerful as the tools he has in his utility belt. Similarly, data scientists are as effective as the tools they have available to them. Many technology vendors provide useful analytics tools to assist data professionals in their analytics efforts. For example, IBM has a variety of tools for data professionals with varying skills levels. For example, SPSS helps data professionals with advanced statistics knowledge manage and analyze data to solve riddles and problems. Data Science Experience helps diverse data professionals work together as a crime-fighting data science team. Still Watson Analytics helps the data novice (Robin?) explore their data using natural language queries.
You can watch the panel discussion below to see our take on the evolution of data science. The video is no “Batman vs Superman: The Movie” but that’s a good thing.
Disclosure: IBM assisted in travel and lodging expenses for the event.