Model Selection for Optimization Under Uncertainty: Cutting Through the Noise with Representative Models

The landscape of statistics has been rapidly shifting. Gone are the days when "inference" was the main player. Now, the emphasis has turned towards learning and forecasting. If the goal leans more towards forecasting or decision-making optimization, diving deep into inference might not always be essential.

Here's an analogy: Imagine you're listening to a symphony with hundreds of instruments. But what if you could identify just a handful of those instruments that define the entire melody, allowing you to enjoy the essence of the music without the full orchestra?

Enter the realms of modern statistical learning:

  1. Supervised Learning: Think of it as a tutor guiding a student. Here, the aim is to establish a relationship between given inputs and their outputs and then predict outputs for new, unseen inputs.
  2. Unsupervised Learning: This is more like self-study. Without specific outputs, the objective is to unearth patterns and relationships among inputs. Popular techniques? PCA (principal component analysis) and clustering.

In the realm of operations research and optimization under uncertainty, decisions heavily depend on the simulation of numerous computational models. With computational resources often at a premium, a pressing question arises: Can we pinpoint a handful of models that can aptly represent the vast multitude?

This was the challenge I addressed during my research at Stanford. I crafted a methodology that leans on unsupervised statistical learning to cherry-pick those "representative models" essential for decision-making under uncertain scenarios. The magic lies in clustering computational models, segmenting them, and then choosing an ambassador from each segment. To make it work, 80% of the effort focuses on choosing the right features that represent each model aptly for clustering-based selection.

But how do we discern the right features? For this, a novel statistical method was introduced to compare and evaluate various "representative subsets". The outcome? A roadmap to identify pivotal features for model clustering and selection.

For the deep-divers, the intricacies of this methodology and its application, especially in subsurface flow processes, can be found in the published paper in Computers & Geosciences.

To view or add a comment, sign in

More articles by Mehrdad G.

Others also viewed

Explore content categories