Usage of Programming Languages by Data Scientists: Python Grows while R Weakens

0
331 views

Share on LinkedIn

The practice of data science, including work in machine learning and artificial intelligence, requires the use of analytics tools, technologies and programming languages. A recent survey of nearly 20,000 data professionals by Kaggle revealed that Python, SQL and R continue to be the most popular programming languages. The most popular, by far, was Python (87% used). Additionally, 8 out of 10 data professionals recommended that aspiring data scientists learn Python first. R has seen a decline in usage over the past two years. 


Figure 1. Programming languages used in 2019. Click image to enlarge.

Kaggle conducted a worldwide survey in October 2019 of 19,717 data professionals (2019 Kaggle Machine Learning and Data Science Survey). Their survey included a variety of questions about data science, machine learning, education and more. Kaggle released the raw survey data and many of their members have analyzed the data (see link above). I will be exploring their survey data over the next couple of months. When I find something interesting, I’ll be sure to post it here on my blog. Today’s post is about the data science and machine learning programming languages data professionals used in 2019.

Most Popular Programming Languages

The survey included a question, “What programming languages do you use on a regular basis? (Select all that apply).” On average, data professionals used 3 (median) languages in 2019. As seen in Figure 1, the top programming language in 2019 was Python (87% of respondents said they used this language), followed by SQL (44%) and R (31%). The top 10 languages are rounded out by Java, C/C++, Javascript, Bash, C, MATLAB and TypeScript.

Of the data professionals who identified as a data scientist, 93% used Python, 57% used SQL and 41% used R.

Comparing program languages usage from 2018, we see that usage of Python has increased 4 percentage points (83% used in 2018) SQL usage remained the same (40% used in 2018). However, R usage decreased 5 percentage points (36%) from 2018 and 15 percentage points (46%) from 2017.

Which Programming Language is Recommended Most?


Figure 2. Programming language recommended by data professionals. Click image to enlarge.

The survey also asked respondents what programming language they would recommend an aspiring data scientist to learn first (see Figure 2). Results showed that nearly 8 in 10 data professionals would recommend Python as the programming language aspiring data scientists to learn first. The remaining programming languages are recommended at a significantly lower rate (R recommended by 9% of respondents; SQL recommended by 6% of respondents.

When looking at data professionals who identified as a data scientist, we find similar recommendations for aspiring data scientists: Python (78%), R (10%) and SQL (7%)

Summary

The results of the Kaggle survey of nearly 20,000 data professionals paint a clear picture about the most popular programming languages for data professionals. Python, by far, continues to be the most popular programming language used by data professionals, followed by SQL and R. While Python continues to increase in popularity, R has seen a decline in use by data professionals over the past two years. Not surprisingly, Python is the most recommended programming language for aspiring data scientists. Even though data professionals have access to many different programming languages, it appears that Python is becoming the default programming language for data science and machine learning.

LEAVE A REPLY

Please enter your comment!
Please enter your name here