Data professionals of all stripes, including data scientists, machine learning engineers and others, use different types of tools in their jobs. A recent survey of over 20,000 data professionals by Kaggle revealed that Python, SQL and R continue to be the most popular programming languages. The most popular, by far, was Python (86% used). Additionally, 8 out of 10 data professionals recommended that aspiring data scientists learn Python first. R has seen a decline in usage over the past four years.
Kaggle conducted a worldwide survey in October 2020 of 20,036 data professionals (2020 Kaggle Machine Learning and Data Science Survey). Their survey included a variety of questions about data science, machine learning, education and more. Kaggle released the raw survey data and many of their members have analyzed the data (see link above). I will be exploring their survey data over the next couple of months. When I find something interesting, I’ll be sure to post it here on my blog. Today’s post is about the data science and machine learning programming languages data professionals used in 2020.
Most Popular Programming Languages
The survey included a question, “What programming languages do you use on a regular basis? (Select all that apply).” On average, data professionals used 2 (median) languages in 2020. Data professionals who had the job titles of DBA/Database Engineer, Data Engineer and Software Engineer, however, used, on average 3 programming languages.
Comparing program language usage from 2017, 2018 and 2019, we see that usage of Python has increased (60% used in 2017 and 83% used in 2018 and 87% in 2019) to 86%. SQL usage remains roughly the same over that same time period (42% used in 2017, 40% used in 2018, 44% used in 2019). However, R usage decreased over a 4-year period (usage was 46% in 2017, 36% in 2018 and 31% in 2019).
When we segment respondents based on their job title, we see that Python still reigns supreme across most of the job titles (see Figure 2). Top programming language used by Statisticians is R and the top programming language used by DBA/Database Engineers is SQL.
Of the data professionals who identified as a data scientist, 94% used Python, 56% used SQL and 37% used R.
Which Programming Language is Recommended Most?
The survey also asked respondents what programming language they would recommend an aspiring data scientist to learn first (see Figure 3). Results showed that 8 in 10 data professionals would recommend Python as the programming language aspiring data scientists to learn first. The remaining programming languages are recommended at a significantly lower rate (R recommended by 7% of respondents; SQL recommended by 5% of respondents.
When looking at different job titles, Python was, by far, the top recommended programming language. A minor exception was for Statisticians where 52% recommended Python and 34% recommended R.
The results of the Kaggle survey of about 20,000 data professionals clearly show the most popular programming languages for data professionals. Python, by far, continues to be the most popular programming language used by data professionals, followed by SQL and R. While Python maintains its popularity, R has seen a decline in use by data professionals over the past four years. Additionally, Python is the most recommended programming language for aspiring data professionals. Even though data professionals have access to many different programming languages, it appears that Python is the default programming language for data science and machine learning.