Avoid the Scale Confound for Better Needs-Based Segmentation

Avoid the Scale Confound for Better Needs-Based Segmentation

#Segmentation, #ChoiceModeling, #LatentClass, #ConjointAnalysis, #MaxDiff, #NeedsBasedSegmentation, #MarketingResearch, #DataScience, #QuantResearch, Drew McGinnis

Scale effects can cause headaches for segmentation analysts. Differences in variable scales (ratings, counts, categoricals) add steps to our analysis or lead us to more complex segmentation algorithms altogether (Chrzan 2025a, 2025b). Even if we restrict ourselves to a single scale type, scale use bias can create thorny problems (Chrzan 2025c).

A different kind of scale confound can affect the results of choice models. Eagle and Louviere (2006) illustrated the effect that differences in respondent consistency can have on the utilities we get from our conjoint and MaxDiff studies: more consistent respondents will have utilities larger in absolute magnitude while less consistent respondents will have smaller utilities. These differences in size owe to the logit “scale factor” which creates another kind of scaling challenge facing segmentation analysts. Note that this problem is bigger than the logit model and it will affect any statistical model. Even simple regression analysis will produce smaller coefficients from less consistent data and larger coefficients from more consistent data.

If preference and scale are confounded, this will confuse the segmentation algorithm we use for choice models, latent class multinomial logit, The basic latent class MNL model looks to create segments that differ in their utilities, but it doesn’t know how to separate the preference and scale components of those utilities.

Fortunately, we have not one but two solutions to this problem. Magidson and Vermunt (2007) built scale-adjusted latent class (SALC) estimation into the Latent Gold software. SALC creates separate latent classes for preference and latent classes for scale. In separating and quantifying both scale and preference differences, SALC assigns each respondent to both a scale segment and a preference segment. This allows the analyst to identify preference segments that have the scale confound removed. It also allows the analyst to examine and interpret the scale segments, potentially to understand what makes for more and less consistent respondents.

A few years later, Orme (2013) identified a second way to remove the scale confound, scale-constrained latent class (SCLC). SCLC is available in Sawtooth’s Lighthouse Studio platform and the CBC/LC software package. SCLC treats scale as a mere nuisance to be eliminated, so it generates latent classes that differ in their preferences while extracting and then ignoring the information about scale.

My colleague @Drew McGinnis and I compared SALC and SCLC at last week’s Turbo Choice Modeling event. We created 6 data sets where each respondent had a known preference segment membership and a known scale factor and across data sets we varied the size of the preference classes and the distributions of the scales. Here’s a summary of the six data sets, each of which had three preference segments:

Article content
Table 1: Study Design

We then ran both SALC and SCLC on each data set and used the BIC statistic to identify the correct number of segments. We can see below that the base latent class MNL program, which doesn’t account for the scale confound, consistently overestimates the number of segments, while both SALC and SCLC get the number of segments correct more often. The mean absolute deviation (MAD) summarizes the performance of the three models, and suggests a possible edge for SALC:

Article content
Table 2: Number of Segments

To measure how well the methods do at assigning the right respondents to the right preference segments, we use a statistic called the Adjusted Rand Index (ARI). ARI compares how much two segmentations. An ARI of 0.0 means two different segmentations are no more similar than you’d expect from chance alone. An ARI of 1.0 means they match up 100% perfectly. Below we report the ARI of each of our three models for each of our six data sets:

Article content
Table 3: Accuracy of Assignment

On average the ARI of the naïve base latent class MNL model suggests poorer performance than either SALC or SCLC, while the latter two are very similar. Note that for data sets 5 and 6, where we have preference segments of unequal size, SCLC performs better than SALC. 

Interestingly, though SALC and SCLC separate preference classes from scale about equally well on average, they do not find the exact same preference segment solutions. Looking at the ARI of SALC vs SCLC makes this clear:

Article content
Table 4: Agreement of SALC and SCLC

The two differ most in data sets 5 and 6, where the preference segments were of unequal size.

It might be interesting to see a more comprehensive study with more artificial data sets, with differing numbers and sizes of preference segments and where the scale distribution might be correlated or not with preference segments (in our study scale differences were independent of preference segment membership). For now, however, it looks like we have two viable ways of addressing the scale confound in needs-based segmentation studies using MaxDiff or conjoint models.

 

 

References

Chrzan, K. (2025a) “Segmentation with mixed scale data – a comparison,” LinkedIn, https://www.garudax.id/pulse/segmenting-mixed-scale-data-comparison-keith-chrzan-afzbe/?trackingId=qMmzJb7Ge1xMmCeluBrqyQ%3D%3D

Chrzan, K. (2025b) “Handling mixed variable types in segmentation,” LinkedIn, https://www.garudax.id/pulse/handling-mixed-variable-types-segmentation-keith-chrzan-tl6xe/?trackingId=%2FdUaRX7Biy3LjfbSo3F5fw%3D%3D

Chrzan, K. (2025c) “Using rating scales in segmentation studies is at least twice as bad an idea as you think,” LinkedIn, https://www.garudax.id/pulse/using-rating-scales-segmentation-studies-least-twice-bad-keith-chrzan-4zk1e/?trackingId=kKmwsD2ObK6RliFpgNz8ug%3D%3D

Louviere, J. and T. Eagle (2006), “Confound It! That Pesky Little Scale Constant Messes up our Convenient Assumptions,” Sawtooth Software Conference Proceedings, Sequim, WA, 211-228.

Magidson, J. and J.K. Vermunt (2007), “Removing the Scale Factor Confound in Multinomial Logit Choice Models to Obtain Better Estimates of Preference,” Sawtooth Software Conference Proceedings, Sequim, WA, 139-154.

Orme, B. (2013). Scale Constrained Latent Class,” Sawtooth Software Research Paper Series, accessed online on May 12, 2025 at 8640e3b5-7460-4019-8fb0-0c72b5abf3bb.

 

Pretty cool, Keith. Is this the scale factor that is usually constrained to be = 1 in logit models, or something else? It’s something else, right?

Like
Reply

I'm such a knewb at all this, so I appreciate all the material you have been sharing!

Like
Reply

I would love it if you could show me mathematically how this removes the actual scale in a latent class scale choice model. Until then I think this I s simply an ad-hoc normalization of the standard deviations of the parameters across segments. The SALC model does mathematically measures and controls for scale.

Thanks for examining this, together with Drew, Keith! Somewhat surprising, but gratifying, that the simple approach for dealing with scale differences in latent class MNL that I proposed in 2013 did as well as it did. It's nice when simple methods work out. Though, the SCLC approach I suggested isn't quite as rigorous as SALC, making the assumption that scale can be characterized by standard deviation of the vector of utilities. There are other normalizations that could have been used.

Like
Reply

To view or add a comment, sign in

More articles by Keith Chrzan

Others also viewed

Explore content categories