Segmenting With Sparse MaxDiff Data

Keith Chrzan

Published Aug 20, 2025

#MarketResearch, #ResearchMethods, #MaxDiff, #ChoiceModeling, #Segmentation, #SparseData, #LatentClass

In some MaxDiff studies we have so many items that we can’t easily fit enough questions to show each item the recommended three to four times to each respondent. In those cases, we often opt for a “sparse” MaxDiff, one wherein we show each respondent enough questions to see each item only once or twice.

While even a sparse design can recover a sample’s true mean utilities with great fidelity, we know that our ability to recover respondents’ true utilities degrades as our MaxDiff design becomes increasingly sparse (Chrzan 2015, Chrzan and Peitz 2019. This should have implications for segmenting with MaxDiff data, but I’ve never quantified the deterioration in our ability to recover segments that sparseness may cause. Recently this question has come up from a client, so I decided to look into it.

Method

I took an existing MaxDiff data set with a large number of items (36) for which I had previously run a four-segment solution using latent class MNL. I took the mean MNL utilities for each segment and I treated them as the known (true) utilities for each respondent in each of four different segments. I copied each set of utilities into 200 rows of data. Thus I have 200 respondents in each of four segments, for a total of 800 respondents whose four unique sets of utilities and whose segment memberships I know for certain.

To create the MaxDiff data files for analysis, I programmed three MaxDiff experiments into our Lighthouse Studio software: one with 100 versions of 27 sets of quads, one with 100 versions of 18 sets of quads and one with 100 versions of nine sets of quads. Respondents in these three experiments will see each item three times, twice and just once, respectively. Using the data generator functionality in our software, I had each of my 800 artificial respondents answer each of the three MaxDiff experiments, using their assigned utilities and a theoretically appropriate amount of random Gumbel response error. The result was three sets of responses from each respondent, one per experiment.

With both the experimental designs and the response data in hand, I ran latent class MNL on each of the three experiments to produce segments. In all three cases the BIC fit statistic correctly identified a four segment solution, which was encouraging. When we compare known segment membership to the segment membership I estimate from these three analyses, however, we expect some degradation from sparseness: that respondents who see each item the recommended minimum of three times per item should fall into their known segment more often than if they see each item twice or once.

Results

And this is exactly what happens. I measured the accuracy of segment assignments using a standard metric called the Adjusted Rand Index (ARI) and I found his pattern:

As expected, the accuracy of segment assignments falls as sparseness gets worse.

A more intuitive way to report this might be to count how often each method puts respondents in the right segments, say by crosstabbing true and estimated segment membership. For example, for the standard MaxDiff, where each respondent sees each item three times, we get this crosstab:

Conclusion

Of course, this analysis uses robotic respondents, but we have no reason to believe we’d have greater success with human respondents (in any case we never know the true segment membership of human respondents).

Also, this is just a single study with one particular pattern of between-segment differences, equal sized segments and respondents programmed to answer with equal amounts of response error. This research might be interestingly expanded to include:

Experiments with different numbers of items
Designs with different amounts of sparseness
Populations with different numbers of segments
Segments with different patterns of utilities
Segments of differing sizes
Respondents with different amounts of response error

But those will be jobs for another day.

References

Chrzan, K. (2015) “A parameter recovery experiment for two methods of MaxDiff with many items,” Sawtooth Research Paper available at https://sawtoothsoftware.com/resources/technical-papers/a-parameter-recovery-experiment-for-two-methods-of-maxdiff-with-many-items

Chrzan, K and M. Peitz (2019) “Best-Worst scaling with many items,” Journal of Choice Modeling, 30: 61-72.

Hilary DeCamp 8mo

Fact that recapture rates are so terrible for segments where everyone is a bullseye member of their segment (not possessing a range of classification probabilities as exist in the real world) is a true indictment of the sparser designs (anything less than 3x). That said, since MaxDiff struggles so much to predict holdout choices on the actual items themselves, you're starting with a flawed importance measurement to begin with. Go with Qsort and save respondents the pain of iterating through all those redundant-seeming choices.

Blake Shulz 8mo

This is great work, Keith Chrzan, thank you for sharing this! It would be really interesting to see whether the drop in % correct from 2 full shows to 1 is linear from 18 tasks down to 9, or more of a cliff at the 1-show point. My guess is the decline gets much steeper as you get closer to only 1 show, since that's when you lose any connectivity of items across tasks and any sense of an overall ranking. I suspect that complete lack of connectivity and ranking beyond "chosen best", "not chosen", and "chosen worst" on each task is what makes it increasingly difficult for latent class to assign respondents to the correct segment.

John Fiedler 8mo

Fully agree

Bryan K. Orme 8mo

Good work, Keith, on a question that I've often been asked and have offered opinions on without running the simulations to support it with evidence. Your results suggest a bigger drop-off in segmentation assignment accuracy when each item is shown 1x vs 2x, as opposed to the lesser drop-off in accuracy when each item is shown 2x vs. 3x. This all feels right to me, from a touchy-feely standpoint. I'd still recommend showing each item 3x per respondent for strong segmentation analysis.

3 Reactions

Paul Markowitz 8mo

I wonder if the number of task versions has any effect on the results. Is it possible that with, say 10 versions, there could be a correlation between version and accuracy? That relationship would be hard to observe with 100 versions. I'm curious whether those that are scored incorrectly are that way at random due to nothing explainable, due to the random errors generated when creating their survey answers, due to differences in the designs and which specific comparisons they evaluated. More ideas for future research.

See more comments

To view or add a comment, sign in

Segmenting With Sparse MaxDiff Data

Keith Chrzan

Method

Results

Recommended by LinkedIn

Conclusion

References

More articles by Keith Chrzan

Others also viewed

Once Upon a Time

Does your data spark joy?(Else, bury it)

Some thoughts on Metric and Nonmetric Data in Market Research

🚀 Unveiling the Power of Data

Exploring The 7 Different Types Of Data Stories

Data Talks And Tells Interesting Stories!

Define: Digital Analytics

Analogy Of Decision Forest Explained

The Data Insights People's News - Nov/Dec 2024

Explore content categories

Method

Results

Recommended by LinkedIn

Conclusion

References

More articles by Keith Chrzan

Spherical Versus Elliptical Clusters

Retro-Clustering: An Old Method Meets Modern Competitors

How a Polish forest spirit can improve your next predictive segmentation

Quick and Easy CBC for Large Numbers of Attributes

Situational Choice Experiments with Exploded Rank Orders

Random Forests vs Alien Invasion

Segmenting With Mixed Scale Data – Part 3

Separate-But-Similar Segments in Multi-Market Segmentation Studies

Partial Profile Pancake (AKA "A Taxonomy, Conceptual Comparison and Evaluation of Partial Profile Conjoint Methods")

Testing Alternative Partial Profile Conjoint Methods

Others also viewed

Once Upon a Time

Does your data spark joy?(Else, bury it)

Some thoughts on Metric and Nonmetric Data in Market Research

🚀 Unveiling the Power of Data

Exploring The 7 Different Types Of Data Stories

Data Talks And Tells Interesting Stories!

Define: Digital Analytics

Analogy Of Decision Forest Explained

The Data Insights People's News - Nov/Dec 2024

Explore content categories