Large Analytic Data Environments: What are your options?


Share on LinkedIn

In a recently published article Online Analytics in Action I provided options for building and maintaining large online environments that allows you to analyze the date efficiently, and make the results actionable within all channels. The options presented may be extended to any large analytic data environment, and not just one that incorporates online data.

Below is a summary of the environment options I put together after the article published with advantages and disadvantages of each as well as the skills required.


As a follow up to the article I feel it is important to provide enhancements that are either in the works or need to be included in future development cycles to enhance each analytic data environment option.

The enhancements include:

  1. Option 1 RDBMS: Continued enhancements in RDBMS solutions that allow them to scale efficiently to huge data environments, such as enhanced data compression and node parallelization.
  2. Option 2 Database Appliance: Embed analytic capabilities directly within the database appliances so the solution is not 2-tiered (i.e. so all data and analytic processes occur on one server/ appliance).
  3. Option 3 Open Source: Continue to enable R to communicate efficiently with Hadoop. Continue to roll out tools such as PIG, Hive & Scoop, which enable easier data access and integration. Start to integrate enterprise level BI/ reporting tools for better and more automated data visualization.


Republished with author's permission from original post.

Roman Lenzen
Roman Lenzen, Partner and Chief Data Scientist at Optumine, has delivered value added analytical processes to several industries for 20+ years. His significant analytical, technical, and business process experience provides a unique perspective on improving process efficiency and customer profitability. Roman was previously VP of Analytics at Quaero and Director of Analytics at Merkle. Roman's education includes a Bachelor of Science degree in Mathematics from Marquette University and Masters of Science in Statistics from DePaul University.


  1. Hi Roman
    I think this is an interesting debate.
    There are also more options to ask around latency of analytics and quality of data, especially looking at business rules which provide the business opportunity of information exploitation i.e. getting value.
    There are also techniques/platforms being used by the current generation of social networks and log management together with transactional real-time analysis e.g. CEM that provide relevant & appropriate rules for customer engagement.


    Simon Harper
    [email protected]


Please use comments to add value to the discussion. Maximum one link to an educational blog post or article. We will NOT PUBLISH brief comments like "good post," comments that mainly promote links, or comments with links to companies, products, or services.

Please enter your comment!
Please enter your name here