Analysis of usage of 5 primary tools used to analyze data showed that the top tool used by data professionals to analyze data are local development environments (54%), followed by basic statistical software (20%), cloud-based data software and APIs (8%), advanced statistical software (6%) and business intelligence software (6%). Tool usage differed across different job titles.
Data professionals use a variety of data science tools, technologies and languages to help them get insights from data. Because data analysis is central to the gaining insights from data, we focus on tools that relate to analyzing data. Kaggle recently conducted a worldwide survey of nearly 20,000 data professionals (see: 2019 Kaggle ML and Data Science Survey The survey asked a variety of questions, including the tools used to analyze data.
Top Tool Used to Analyze Data
The survey asked respondents, “What is the primary tool that you use at work or school to analyze data?” The top tools used to analyze data was, by far, Local development environments (RStudio, JupyterLabs, etc.) (54%). The complete list of tools were (See Figure 1):
- Local development environments (RStudio, JupyterLabs, etc.) (54%)
- Basic statistical software (Microsoft Excel, Google Sheets, etc.) (20%)
- Cloud-based data software and APIs (AWS, GCP, Azure, etc.) (8%)
- Advanced statistical software (SPSS, SAS, etc.) (6%)
- Business intelligence software (Salesforce, Tableau, Spotfire, etc.) (6%)
- Other (8%)
Figure 1. Primary tools used to analyze data.
There were differences in tool usage by respondents’ job title. Local development environments were the top tool for the majority of job titles, with the exception of Business Analysts whose top pick was basic statistical software. The second most popular tool varied by job title:
- Basic statistical software: Research Scientist, Student, Software Engineer, Data Analyst, Product/Project Management, DBA/Database Engineer
- Cloud-based data software and APIs: Data Scientist, Data Engineer
- Advanced statistical software: Statistician
- Local development environment: Business Analyst
Conclusions
There are a variety of tools that data professionals can use to analyze their data. In this study, the top tools used, by far, were associated with local development environments like RStudio and JupyterLabs. These tools allow you to develop and test your analytics without sharing it with others (until you are satisfied with it). Because data scientists do a lot of their work on machine learning algorithms that require much tweaking and testing, it’s not surprising that 2/3rds of them use this approach.
The second most popular tool to analyze data is basic statistical software, including Microsoft Excel and Google Sheets. Many different data professionals who may not be formally trained in analytics appear to be somewhat heavy users of these tools, including business analysts, database engineers, product/project managers and software engineers. Not surprising, these basic statistical tools are used least by data scientists.
Advanced statistical software like SPSS and SAS are popular with statisticians. These types of tools allow users to easily apply much more sophisticated analytics on their data compared to Microsoft Excel or Google Sheets, including analyses like principal component analysis for data reduction, regression analyses and more.
No single analytics tool will do it all and you will likely need to use a set of applications for your data project like ML frameworks and tools. Selecting the right data analytics tool will definitely improve your chances of success in your data projects. The results of this analysis suggest that the analytics tools you choose need to fit your needs and level of expertise.