A recent survey of over 16,000 data professionals showed that the most used platforms/resources included Kaggle, Online courses and Stack Overflow Q&A. Additionally, the most useful platforms/resources included Personal Projects, Online courses and Stack Overflow Q&A. On average, data pros used around three (3) different platforms/resources to learn data science skills.
There are many ways to acquire data science skills, including online courses, blogs, textbooks, trade books, YouTube videos and more. Which approach should an aspiring data professional use to learn data science skills? To answer this question, I used data from the Kaggle 2017 State of Data Science and Machine Learning survey of over 16,000 data professionals (survey data collected in August 2017). This comprehensive survey asked a variety of questions about their education and work practices.
Most Used Platforms and Resources to Learn Data Science
The survey asked respondents, “What platforms & resources have you used to continue learning data science skills? (Select all that apply).” Results appear in Figure 1 and show that the top 10 platforms/resources used were:
- Kaggle (40% used this resource)
- Online courses (36%)
- Stack Overflow Q&A (34%)
- YouTube Videos (32%)
- Personal Projects (29%)
- Blogs (29%)
- Textbook (25%)
- College/University (20%)
- Arxiv (15%)
- Official documentation (14%)
Results revealed that, on average, data professionals have used three (median) platforms/resources to learn data science. The number of platforms/resources used varied significantly across job title. Data professionals who self-identified as a Data Scientist, Machine Learning Engineer, Predictive Modeler, Researcher or Scientist/Researcher reported using four platforms. Data pros who self-identified as a Computer Scientist, Data Miner or Programmer used only two platforms.
Groupings of Platforms and Resources
I conducted a principal component analysis on the usage metric (0 = not used; 1 = used) to identify naturally occurring platform groupings. I found a clear 4-component solution, showing that specific learning platforms/resources tend to be used in conjunction with other platforms.
The four components (platform/resource groupings) are:
- Applied: This platform/resource grouping consisted of learning platforms/resources that provide an applied approach to learning about data science. These platforms/resources included:
- Blogs, Kaggle, Official documentation, Online courses, Personal Projects, Stack Overflow Q&A, Textbook and YouTube Videos
- Traditional: This grouping consisted of platforms that reflected a more traditional way of learning. The platforms/resources included:
- College/University, Company internal community, Conferences, Friends network and Tutoring/Mentoring
- Scientific: This component is represented by one platform, a repository of scientific papers from a variety of fields, including mathematics, physics, astronomy, computer science, quantitative biology, statistics, and quantitative finance. The platform/resource includes:
- Casual: This component consists of platforms/resources that contain easily consumable content for a good overview of the topic. The platforms/resources included:
- Newsletters, Non-Kaggle online communities, Podcasts and Trade books
These four platform groupings tell us that data professionals tend to use specific platforms together (if you use Blogs, say, you are also likely to use other platforms in the “Applied” platform groupings).
Most Useful Platforms and Resources to Learn Data Science
Respondents were asked to indicate the usefulness of each platform/resource they used. The results appear in Figure 2. In general, all platforms get pretty high marks; all platforms received 90%+ ratings of usefulness ( “somewhat useful” and “very useful”). The distinction among the platforms appears when we separate the “somewhat” from the “very.” The top 10 most useful platforms and resources to learn data science are:
- Personal Projects (74% very useful)
- Online courses (70%)
- Stack Overflow Q&A (63%)
- Kaggle (62%)
- Tutoring/Mentoring (58%)
- Textbook (55%)
- College/University (55%)
- Arxiv (55%)
- Official documentation (52%)
- Non-Kaggle online communities (49%)
Data professionals have access to a variety of platforms and resources to continue learning data science skills. While the data professionals used around three (3) platforms, no single platform was used by more than 40% of the data professionals. Popular platforms included Kaggle, Online courses and Stack Overflow. Platforms that were rated as most useful included Personal Projects, Online courses and Stack Overflow Q&A.
Whether you are an aspiring or seasoned data professional, continuing to learn new data science skills is essential to staying relevant (being employed). I hope the results of the current analysis can help you identify the best way you can acquire new data science skills.