Big Data Analytics: An Evaluation of Leaders, Progress, and Community


Share on LinkedIn recently released the results of their annual data mining software usage survey and the top 4 analytic software packages used are open source. The top 13 shown below (I don’t consider Excel a viable data mining solution). Full survey results may be found here.

Analytic Software survey results

The top two open source solutions, which complement each other well, are R and RapidMiner. R is a powerful analytic solution that continues to become adopted in the analytic commercial community, but requires programming skills and has a steep learning curve, while RapidMiner is an intuitive point-and-click GUI based analytic solution. Other nice open source analytic solutions are available such as KNIME, WEKA and Orange. Orange is coming on strong of late and shows great promise especially considering it is a Python-based solution which allows for easier integration and flexibility. As Orange continues to build out its analytic functionality, I expect user adoption to grow.

Big data integration into distributed data environments, such as Hadoop, are where the open source (community) solutions lead the charge. Not only can you integrate R and RapidMiner within Hadoop you can ‘push’ some of the analytic processing directly to Hadoop through solutions such as RHadoop and RHive (R packages) and Rahoop (RapidMiner add-on). Similar solutions are also being created for the big data environments Cassandra and MongoDB.

Are commercial analytic solutions like SAS and SPSS being phased-out? Absolutely not. First, the analytic community is growing at a fast pace, and it is good to have a variety of solutions available, so even if the survey above shows that they may be slightly losing market share the analytic user community is growing at a great clip. Also, some of the commercial analytic tools have deep user bases and certain segments of those users will not defect to only open source solutions any time soon for a variety of reasons. However, all commercial analytic solutions need to evolve quicker as the open source ‘community’ continues to press forward at a rapid pace.

I have heard viewpoints pro-and-con for both the open source and the commercial analytic solutions. Regardless of your allegiance, you will most likely agree that the competition is healthy. I’m not sure what solutions will top the list in 5 years but I do know that those solutions that do not evolve quickly will not be in the top 10. And, big data analytics integration will be one of the primary evolutionary needs in the upcoming years.

Republished with author's permission from original post.

Roman Lenzen
Roman Lenzen, Partner and Chief Data Scientist at Optumine, has delivered value added analytical processes to several industries for 20+ years. His significant analytical, technical, and business process experience provides a unique perspective on improving process efficiency and customer profitability. Roman was previously VP of Analytics at Quaero and Director of Analytics at Merkle. Roman's education includes a Bachelor of Science degree in Mathematics from Marquette University and Masters of Science in Statistics from DePaul University.


Please use comments to add value to the discussion. Maximum one link to an educational blog post or article. We will NOT PUBLISH brief comments like "good post," comments that mainly promote links, or comments with links to companies, products, or services.

Please enter your comment!
Please enter your name here