The practice of data science requires the use of analytics tools, technologies and languages to help data professionals extract insights and value from data. A recent survey by Kaggle revealed that data professionals relied on Python, R and SQL more than other tools in 2017. Looking ahead to 2018, the survey results showed that data professionals are most interested in learning Tensorflow, Python and R.
Kaggle conducted a survey in August 2017 of over 16,000 data professionals (2017 State of Data Science and Machine Learning). Their survey included a variety of questions about data science, machine learning, education and more. Kaggle released the raw survey data and many of their members have analyzed the data (see link above). I will be exploring their survey data over the next couple of months. When I find something interesting, I’ll be sure to post it here on my blog. Today’s post is about the data science and machine learning tools and technologies data professionals used in 2017 and the tool and technologies that excite them in 2018.
Most Popular Data Science/Analytics Tools, Technologies and Languages in 2017
The survey included a question for data professionals who were empoloyed, “For work, which data science/analytics tools, technologies, and languages have you used in the past year? (Select all that apply).” On average, data professionals used 4 (median) tools in 2017. The top used tool in 2017 was Python (60% of respondents said they used this in the previous year), followed by R (46%) and SQL (42%). The top 10 tools are rounded out by TensorFlow, Amazon Web Services, Unix shell / awk, Tableau, C/C++, NoSQL and MATLAB/Octave.
A recent poll by KDNuggets also found that Python, R and SQL were the top tools used in 2017. While there was agreement across these two studies with respect to the top three data science tools used, adoption rates for other data science tools varied across the Kaggle and KDNuggets results like RapidMiner (e.g., KDNuggets: 33%; Kaggle: 3%), KNIME (KDNuggets: 19%; Kaggle: 3%). Differences across the two studies could simply reflect differences in samples of respondents used in each study.
I found that 71% of working data professionals used either Python, R or both in the previous year.
Which Data Science Tool and Technology are Data Pros Most Excited about Learning in 2018?
The survey also asked all data professionals (working and not) about the tool they are most excited about learning in the next year (see Figure 2). Results showed that data professionals are most interested in learning TensorFlow (24%), followed by Python (16%) and R (8%).
The results of the Kaggle survey of over 16,000 data professionals paint a clear picture about the most popular data science tools. A majority of current data professionals use either Python or R. Also, data professionals pick these two tools as the ones they are most excited about learning in the coming year.
While TensorFlow, an open source library for fast numerical computing, was the 5th most popular data science tool used last year, it garnered the most interest from data professionals in the coming year. This interest in TensorFlow is likely the result of the growing interest in developing machine learning (and deep learning) models.
While data professionals have access to many different data science tools, it appears that Python and R are becoming the standard analytics tools for the field of data science and machine learning.