Tuesday, August 31, 2010

Marketing Analytics: Understanding Segmentation and Prediction

By Alberto Roldan, Copyright 2010 Alberto Roldan
Campaign management, media mix optimization, cross-sell, and up-selling are some of the terms that are commonly used in marketing analytics. Although those terms are understood in a business context, the advanced analytics techniques behind those terms are not as well understood. The purpose of this article is to explain some of the analytics techniques used in marketing analytics, such as segmentation and predictive analytics, so executives understand the capabilities and limitations involved. A secondary objective is to help analytics professionals explain concepts to businesses.

Segmentation is the process of dividing a large market into groups that have similar characteristics. There are two main issues that I have found in explaining this process to businesses: 1) the different processes in arriving at granular vs. aggregated segmentations; and 2) the difference between “similar” vis-à-vis “equal” characteristics within a segment.

Granular vs. Aggregated Segmentations
Companies need to be able to separate the clusters of data and identify the driving factors for marketing and sales purposes. Hence, a limited number of segmentations that is directly related to the core business is necessary for strategic and tactical decision making. The limited number of segmentations (about 6) is what I refer to as aggregated segmentations. Examples of aggregated segmentations are best customers, next best, infrequent buyers, and power buyers. The process of arriving at a consensus of aggregated segmentations is a combination of business knowledge and experience with statistical data analysis of granular segmentations. Aggregated segmentations allow businesses to efficiently analyze large datasets and make decisions that will impact profit, revenues, and costs.

Granular segmentations refers to the separation of the clusters of data using advanced analytics techniques like hierarchical partition, k-means clustering, distribution, and correlation analysis. Therefore, the main difference between the process of distinguishing granular and aggregated segmentations is that the first is determined by using proven mathematical techniques, while the later employs business knowledge and advanced analytics techniques. A frequently asked question is, Why do we need to do granular segmentations before arriving at aggregated segmentations? The answer is that analytics requires following scientific methodology (observations, theory, experiment, and outcome). The scientific method allows for objectivity, reduces bias in the interpretation of results, and brings measurable precision to the process. Moreover, the accuracy of a prediction is based on granular segmentations rather than aggregated segmentations.

Similar Characteristics vs. Equal Characteristics
A challenge encountered in attempting to explain segmentation to businesses is the difference between similar vs. equal characteristics in a segment. The members of a granular segment might have equal characteristics and, hence, be homogeneous. The members of an aggregated segment will have similar but not equal characteristics. The granular segment characteristics are determined by using advanced analytics techniques; therefore, the homogeneity of the segments is determined with mathematical precision.

On the other hand, aggregated segments are determined by combining homogeneous granular segments with business value and experience. Since multiple distinct granular segments are combined in aggregated segmentations, those segments should have similar but not equal characteristics. For example, the best customer segment will have customers with similar characteristics such as frequency of purchase, but not all the frequencies will be the same (i.e., once a week, twice a week, or daily).

The importance in understanding the differences within an aggregated segment allows the decision makers to be precise in their strategic decisions while simultaneously considering a manageable set of segments.

Predictive Analytics
Availability and Data Quality
Predictive analytics refers to the ability to accurately predict an event or occurrence, for example, that a customer will purchase a product or service at a set price or within a certain price range. This area is so broad that I am only going to address two issues: 1) data availability and quality; and 2) accuracy of prediction. A baby must first learn to crawl before it can run. Although this concept is fairly obvious, sometimes its application in marketing analytics is not well understood. In order to make a prediction, data must be available and of acceptable quality. When a new product is launched into the market, predictions are difficult because of lack of data. Sometimes the initial analytics outcome is limited to comparing similar characteristics of a new product with an existing product. The next step is to make an inference (weighted value) that the new product may perform similarly to an existing product. As the new product gains traction into the market and that data becomes available, the accuracy of any prediction will substantially improve. The availability of this new data will prove, disprove, or modify the inference that was initially made. Availability of data also means that the data is accessible in the correct format for analysis.

Data quality refers to the percentage of individual variables that have correct information, as well as how the aggregate data quality issues impact the accuracy of any prediction. The old computer science axiom “junk in, junk out,” is true in marketing analytics. Therefore, it is crucial that a thorough ETL (extraction, translation, and load) process, including a data quality hub, be in place prior to attempting any enterprise predictive analytics. In other words, this is the seam where best practices in business intelligence (BI) and advanced analytics meet. It is important to remember that the accuracy of any prediction is directly correlated to the quality of your data. Therefore, executives should address data quality issues at the beginning of any analytics project.

Accuracy of Prediction
There are two issues that I would like to address regarding accuracy of prediction: variables and analytics tools. In the IT and BI world we speak of fields or data elements. In the analytics world, we talk about variables. A dataset may have hundreds of data elements, but analytics uses a limited number of relevant and pertinent variables. In order to understand whether their company can successfully implement a predictive analytics project, business decision makers must be able to distinguish the fundamental differences between the IT and Analytics languages. I like to think about this as the difference between learning how to say “food” in English and in Chinese—both words are necessary if you want to eat in each country.

One of the most common mistakes in predictive analytics is thinking that if we input data elements into a predictive analytics tool such as SAS, SPSS, or KXEN, we are going to obtain accurate predictions. This is a lack of understanding of the internal workings of regression algorithms. Regression works on a set of independent variables and a dependent variable. Therefore, regression reads independent variables as separate from one another. If a ratio between two independent variables is pertinent and relevant to a prediction, that ratio must be created as an independent variable. For example, if the variables are “date of first purchase” and “date of last purchase,” and you think that the relevant variable is “days between purchases,” then you need to create this variable. Otherwise, the regression algorithm reads those separate variables as independent from one another. Experience in variable creation is one of the areas business decision makers should examine when evaluating a predictive analytics project. The accuracy of a prediction is directly related to the variables used in the analytics model.

Analytics Tools
I have found that companies want to talk analytics tool evaluation right away when considering a predictive analytics project. This tendency is driven by IT experience with estimating cost of software, hardware, and staffing with qualified resources. Although analytics tool selection is an integral part of any predictive analytics project, it is neither the most important consideration, nor should it be the driving motivation for an analytics project. For example, I can go to a hardware store and buy the best carpenter tools, but if I do not understand their proper use, my success rate in building anything will be significantly reduced. A master carpenter with an old hammer and a hand saw will build a house faster and better than a novice with the best power tools.

One of the most important analytics tools that decision makers tend to ignore is to create a separate environment for advanced analytics. I have found that on occasion executives do not understand that predictive analytics consumes a large amount of internal memory and tends to negatively impact performance of current operational systems. The solution is fairly simple: build your analytics engine in a separate environment.

A recent survey found that three out of four executives understand that predictive analytics are essential to the operations of their business. Decision makers can and should use proven advanced analytics techniques to improve profitability. If executives learn the fundamentals of business analytics, its possibilities and limitations, they will be able to make better informed decisions in the investment of these new technologies.

Business Analytics

Business Analytics

About Me

My photo
See my resume at: https://docs.google.com/document/d/1-IonTpDtAgZyp3Pz5GqTJ5NjY0PhvCfJsYAfL1rX8KU/edit?hl=en_USid=1gr_s5GAMafHRjwGbDG_sTWpsl3zybGrvu12il5lRaEw