In a recently published article Online Analytics in Action I provided options for building and maintaining large online environments that allows you to analyze the date efficiently, and make the results actionable within all channels. The options presented may be extended to any large analytic data environment, and not just one that incorporates online data.
Below is a summary of the environment options I put together after the article published with advantages and disadvantages of each as well as the skills required.
As a follow up to the article I feel it is important to provide enhancements that are either in the works or need to be included in future development cycles to enhance each analytic data environment option.
The enhancements include:
- Option 1 RDBMS: Continued enhancements in RDBMS solutions that allow them to scale efficiently to huge data environments, such as enhanced data compression and node parallelization.
- Option 2 Database Appliance: Embed analytic capabilities directly within the database appliances so the solution is not 2-tiered (i.e. so all data and analytic processes occur on one server/ appliance).
- Option 3 Open Source: Continue to enable R to communicate efficiently with Hadoop. Continue to roll out tools such as PIG, Hive & Scoop, which enable easier data access and integration. Start to integrate enterprise level BI/ reporting tools for better and more automated data visualization.
Hi Roman
I think this is an interesting debate.
There are also more options to ask around latency of analytics and quality of data, especially looking at business rules which provide the business opportunity of information exploitation i.e. getting value.
There are also techniques/platforms being used by the current generation of social networks and log management together with transactional real-time analysis e.g. CEM that provide relevant & appropriate rules for customer engagement.
ATB
Simon Harper
[email protected]