The measurement of customer loyalty has been a hot topic lately. With the latest critiques of the Net Promoter Score coming in from the both practitioners and academic researchers, there is much debate on how companies should measure customer loyalty. I wanted to formally write my thoughts on this topic to get feedback from this community of users. Much of what I will present here will be included in the third edition of my book, Measuring Customer Satisfaction. I welcome your thoughts and critiques. Due to the length of the present discussion, I have broken down the entire discussion into several parts. I will post each of them weekly. Here is Part 3 of the discussion. If you missed them, read Part 1 and Part 2.
The results of the factor analyses support the use of composite scores, each representing one of the loyalty dimensions. These composite scores are referred to as scales/indices/metrics. This process of calculating these scales/indices/metrics is done by averaging the items that load on the same factors. Based on the results of the present analyses, we can calculate three indices:
When using scales from surveys to measure constructs, we need to be concerned about the quality of the scales. The quality of these surveys are typically discussed with respect to reliability and validity. Reliability refers to the degree to which scores are free from measurement error. Validity refers to the degree to which the scale measures what is was designed to measure. Before we use these new scales, I will briefly discuss these measurement principles in the context of classical test theory.
Classical Test Theory
Classical test theory is based on the idea that an observed score (X) from a survey can be decomposed into two different scores, a true score (T) and an error score (E) where:
X = T + E
As the equation implies, as error decreases, the observed score (X) matches the underlying true score (T). Classical test theory is concerned with the relationships among the three variables, X, T and E. The relationships among these three components are used to understand the quality of the scores that result from the scales. The first quality of measurement, reliability, is concerned with the relationship between the observed score (X) and the true score (T).
Reliability is the degree to which measurements are free from random errors. Reliability deals with precision or consistency of measurement. Scales or indices with high reliability are better at distinguishing people on the continuum of customer loyalty. Our goal in developing customer loyalty indices is to have a measurement instrument that delivers reliable results. Reliability can be thought of as the relationship between the true underlying score and the observable score we get from our survey. Random error decreases the measurement’s reliability; that is, as random error is introduced into measurement, the observed score is not a good reflection of the true underlying score. For one to feel confident that a questionnaire’s scores accurately reflect the underlying dimension, the questionnaires must have high reliability. Although many types of reliability exist, internal consistency reliability is vital to surveys.
Internal consistency indicates the extent to which the items in the measurement are related to each other. The higher the interrelationship among the items, the higher the internal consistency. If a questionnaire is designed to measure one underlying construct, the items are expected to be related to each other – that is, people who respond in one way to an item are likely to respond the same way to the other items in the measure.
There are several statistical indices used to estimate the degree of internal consistency. The most commonly used index is Cronbach’s coefficient alpha (Cronbach, 1951). Basically, this alpha coefficient indicates the degree to which items are related to each other. Cronbach’s alpha increases when the correlations among the items increase. Cronbach’s alpha can range from 0 to 1.0. A reliability of 0 indicates that the observed score is not related to the underlying true score; a reliability of 1 indicates that the observed score is a perfect indicator of the underlying true score. Generally, a reliability of .8 or greater is an acceptable level of reliability.
There are a couple of key benefits to using customer loyalty indices that have high reliability. First, customer loyalty scales with high reliability are better able to distinguish between varying levels of customer loyalty than loyalty scales with low reliability. Because scales with higher reliability have excellent precision, they are able to distinguish small differences in loyalty. Second, using a loyalty scale with high reliability, you are more likely to find significant relationships with other variables when loyalty is truly related to them.
Although reliability is an important ingredient in the evaluation of a questionnaire, it cannot solely determine the quality of the questionnaire. The questionnaire’s validity must also be addressed.
Validity refers to the degree to which evidence supports the inferences made from scores derived from measurements, or the degree to which the scale measures what it is designed to measure. Unlike reliability, there is no single statistic that provides an overall index of the validity of inferences about the scores.
The methods for gathering evidence of validity can be grouped into three categories: content-related evidence, criterion-related evidence, and construct-related evidence. These labels simply enable people to discuss the types of information that might be considered when determining the validity of the inferences.
Content-related evidence is concerned with the degree to which the items in the questionnaire are representative of a “defined universe” or “domain of content.” The domain of content typically refers to all possible items that could have been used in the questionnaire. The goal of content-related validity is to have a set of items that best represent the defined universe.
Criterion-related evidence is concerned with examining the systematic relationship (usually in the form of a correlation coefficient) between the loyalty scale and another measure, or criterion. In this case, what the criterion is and how it is measured are of central importance. The main question to be addressed in criterion-related validity is how well the scale can predict the criterion.
Construct-related evidence is concerned with the questionnaire as a measurement of an underlying construct. Unlike criterion-related validity, the primary focus is on the scale itself rather than on what the scale predicts. Construct-related evidence is derived from both previous validity strategies. A high degree of correlation between the scale and other scales that purportedly measure the same construct is evidence of construct-related validity. Construct-related validity can also be evidenced by a low correlation between the scale and other scales that measure a different construct.
The figure below illustrates the distinction between reliability and validity. Recall that reliability deals with precision/consistency while validity refers to meaning behind the scores. The diagram consists of four targets, each with four shots. In the upper left hand target, we see that the there is high reliability in the shots that were fired yet the bull’s-eye has not been hit. This is akin to having a scale with high reliability but is not measuring what the scale was designed to measure (not valid). In the lower right target, the pattern indicates that there is little consistency in the shots but that the shots are all around the bull’s-eye of the target (valid). The pattern of shots in the lower left target illustrates low consistency/precision (no reliability) and an inability to hit the target (not valid). The upper right pattern of shots at the target represents our goal to have precision/consistency in our shots (reliability) as well as hitting the bull’s-eye of the target (validity).
Reliability of Loyalty Indices
Reliability estimates were calculated for each of the loyalty indices. For the Wireless Service Provider sample, the reliability (Cronbach’s alpha) of the Advocacy Loyalty Index (ALI) was .92. The reliability estimate (Cronbach’s alpha) for the Purchasing Loyalty Index (PLI) was .82. For the Personal Computer Manufacturer sample, the reliability of the ALI was .94. The reliability of the PLI was .87. These levels of reliability are considered very good for attitude research. The high reliability of each of the scales suggests that there is minimal measurement error associated with each composite score. Thus, we can feel confident that the observed scores (X) we get from the survey results are a very good reflection of the underlying true scores (T).
Validity of Loyalty Indices
Establishing the validity of the loyalty scales is a more complex process and will be discussed in the next blog.
You can download a free copy of executive reports on the two studies (Wireless Service Providers and PC Manufacturers) at Business Over Broadway.
Allen, M.J., & Yen, W. M. (2002). Introduction to Measurement Theory. Long Grove, IL: Waveland Press.
Cronbach, L. J. (1951). Coefficient alpha and the internal structure of tests. Psychometrika, 16(3), 297-334.