Evaluating the value of a machine learning prospectivity map

Antoine Caté

Published Jan 18, 2019

Mineral potential maps (or prospectivity maps) have been used for several decades in the industry to help with selecting the most prospecting ground for exploration. These maps are the best way to integrate all available data and help decide were to go next. In the last decade, the use of data-driven and machine learning approaches to mineral potential mapping has become mainstream. Companies and consultants (including myself) propose machine learning prospectivity analysis to their clients and research papers are published every months on the subject.

Open source tools and public data allow everyone to try and test machine learning for mineral potential mapping. This is a great opportunity for all of us to improve the current practices and propose new solutions and approaches to the mineral exploration community. However, not everyone has experience with machine learning and not everyone is aware of the pitfalls of training an algorithm. Machine learning is complex and requires rigorous approaches. Published examples, both from the industry and from academics show workflows that would scare a machine-learning scientist with results that nonetheless look plausible to a geoscientist.

A specific concern I want to address in this article is models evaluated on training data. A machine learning algorithm learns a model from existing training deposits. We try to use this model to predict were we will be most likely to discover new deposits. It is common practice to estimate the model performance by looking at how many known deposits have been predicted by the model. This evaluation should NEVER be done on the deposits used to train the model. An algorithm can be extremely good at (re-)discovering example training data, and poor at predicting new deposits. This is called model overfitting. As an example, I have trained an algorithm on some Australian public data and estimated performance on both the training deposits, and on validation deposits that were not used as input data in the algorithm.

I obtained an accuracy score of 0.96/1 on training deposits. Amazing! But then, performance on validation data was only 0.58/1.... Not that good... The figure below illustrates well the problem. All training deposits are included in the 5% of the most prospective ground, while it takes 20% of the ground to include only 90% of the validation deposits.

Evaluation of a model performance on training data will misguide all subsequent decisions-making. The wrong model will be chosen to produce a mineral potential map, and decision-makers will be over-confident in the map ability to outline prospective ground. Make sure to always estimate model performance on hidden validation data, and if you are the client always request such a validation. You can even keep a few deposits hidden ask your consultant to try and predict them once the model is trained. Otherwise, you might end-up taking the wrong exploration decisions and loose money instead of better focus your exploration.

Rachel Thompson 7y

Great work Antoine!

Tony Donaghy 7y

Machine learning and prospectivity mapping is a tool. It isn’t a solution. Like all tools it will have application in certain areas and will fail dismally in others. Its dependence on data and the current paradigms is its biggest downfall. Relying on ML while simultaneously trying to be the first mover into an area will be a difficult fit, as often the first mover is also the first to acquire the data the ML needs to make its assessment - by the time the ML has the data density it needs for meaningful results, it’s too late. Also, how many deposits have been found that don’t fit the accepted paradigm of the day and that require a whole new round of research to understand how they formed? ML will never predict a new exploration paradigm. So it’s a tool, not the answer.

10 Reactions

Guy Desharnais 7y

Antoine. Great article, keep it up. Another potential trap of the application of ML to exploration is that cells or blocks are never truly independent. Depending on how you setup your model, neighboring cells will have almost the exact same raw data because you may have interpolated data to fill cells that are incomplete (exploration data is messy and clustered). So if you selected your training set as a "random" selection you are still learning and predicting from cells that may contain essentially the exact same raw data; thus your prediction of success will be biased (the smoother your interpolation, the higher your "success"). So many traps; common sense and good geology will always be necessary.

Evaluating the value of a machine learning prospectivity map

Antoine Caté

More articles by Antoine Caté

Others also viewed

K-means clustering in Seismic Data Processing and Interpretation

Engineering Application of Data Science

Process Mining: Data Science in Action

Data is Where You Look for it – The Power of Text Mining and Analytics

History of Data Mining

Data Mining

Data Mining - Power of Clustering

Advanced tools support low-impact mineral prospectivity mapping

Data Mining Techniques for Uncovering Hidden Market Opportunities

Data Mining approaches for Evaluation - Part 1

Understanding Overfitting In Predictive Analytics

The Importance Of Cross-Validation In Machine Learning

Best Practices For Evaluating Predictive Analytics Models

How to Optimize Machine Learning Performance

Explore content categories

More articles by Antoine Caté

Keeping geologically sound models using interpolation

3D modeling combining "AI" and geological interpretation

Why Depth Referencing Matters in Automated Core Logging—and How to Nail It

How to Estimate the Performance of an Automated Logging Tool

Unlocking the Value of Core Images with Computer Vision

Feature importance estimation

07: Using machine learning for exploration geochemistry

6: Multivariate outlier detection for mineral exploration

Univariate anomaly detection

Scaling / normalization of geochemistry data