Slay the Data-Overload Dragon With Distributed Processing


Share on LinkedIn

A large financial services organization using SAS to manually create models had a 92
percent accuracy in its cross-sell predictions. It moved to a data mining solution using “distributed processing” and saw its accuracy improve to about 98 percent.

Distributed processing is a way to accelerate traditional data processing, making predictive CRM more attainable by more businesses.

The problem of “data overload” is a major one faced by anyone with a CRM system. According to a November 2002 survey by Gartner, executives at 90 percent of companies surveyed said they suffered from information overload and that it weakened their competitiveness.

Marketing executives want to be able to leverage their CRM databases for predictive purposes and to capitalize on this to improve marketing effectiveness. Predictive CRM is a process in which companies “mine” their CRM databases for predictive patterns, which will help them better target and predict response or risks related to their marketing initiatives. By finding accurate ways to leverage their CRM databases to help predict customer response, companies are able to maximize the effectiveness and ROI of their marketing initiatives.

The great thing about these predictive CRM initiatives is that tangible ROI metrics can be derived that underwrite a company’s justification for their entire CRM program investment—if you could only get through all the data your company has amassed.

In general, predictive CRM involves developing predictive mathematical models or algorithms, which will assist them in predicting high likelihood outcomes in the following areas:

  • Predicting customer attrition.

  • Predicting customer acquisition.

  • Risk analysis.

  • Identifying cross-sell/up-sell opportunities.

  • Determining drivers of customer loyalty.

Choosing the best tools for predictive CRM may be difficult. It’s even harder to find the analytical talent to successfully build and implement these predictive models. The person not only needs sound technical and analytical skills but also must have a deep and intimate knowledge of the company’s business and its operations. When you are using manual statistical packages for developing predictive models, your staffing requirements can be a significant overhead on the business

At the same time, you’re often faced with barreling through millions and even billions of records covering your entire base of customers and customer prospects. Contained in these databases are desirable outcome variables, such as purchase transactions and promotional response. They also contain hundreds of descriptive data—including demographics, psychographics, customer satisfaction ratings and the length of time since and between transactions—that have predictive value related to outcomes. One of the dilemmas of predictive modeling is figuring out how you can have the intelligence and horsepower to build the most accurate predictive models by trying to mine patterns from these very large databases

That’s where distributed processing comes in. Distributed processing takes large CPU intensive operations and divides them among a large number of parallel PC workstations, sharing the burden and achieving considerable efficiencies in operation speed.

By applying distributed processing technology, the financial services organization I discussed earlier was able to test and evaluate five times more predictive models than manual statistical models over the same period of time. Over the course of the year, the improvements in the accuracy of the model amounted to $41.4 million in savings.

Distributed processing technology works as illustrated below. One controlling workstation is connected to numerous parallel workstations. Operational and mathematical tasks are divided and shared among all connected workstations. By interconnecting and sharing of these tasks, the
processes run faster.

For any given predictive modeling exercise, there are typically 1,500 to 3,000 distinct modeling methodologies and forms that could be applied to any given problem. With the power of distributed processing, solutions for hundreds and even thousands of possible predictive models can be achieved at speeds of 50 to 100 times faster than conventional and manual statistical packages. This not only saves time but also puts less stress on the company to demand more analytical talent and bodies to develop these models.

The power of “distributed processing” can help businesses achieve “faster, cheaper and better” solutions to the dilemma of data overload. In the process, they can reduce costs and achieve significant and measurable improvements in marketing ROI.

Michael Wolfe
Bottom Line Analytics, LLC
Michael Wolfe is president and founder of Bottom Line Analytics, LLC, an advanced analytics and modeling consulting company based in Marietta, GA. Wolfe has held positions in market planning and research with the Kellogg Co., Fisher-Price Toys, Kraft Foods, the Coca-Cola Co. and Burke Marketing Research.


Please use comments to add value to the discussion. Maximum one link to an educational blog post or article. We will NOT PUBLISH brief comments like "good post," comments that mainly promote links, or comments with links to companies, products, or services.

Please enter your comment!
Please enter your name here