Multiple steps solution

Quang Si LE, PhD

Published Dec 18, 2018

Continuous

Although the problem looks like a binary classification problem, and we will employ the binary classification methods to find build the final solution, it is not a straight forward applications as a binary classification. We will not be able to just deploy standard classifications, in which we will learn model M from training data including negatives and positives sets (H and Q) to find BMW's owners in H.

The main reason is that the binary classifications are designed to find the perfect "line" to separate two classes rather than detecting false negatives or positives. Thus, the methods would focus mainly on finding the simplest line to separate H and Q and rather than to find models presenting well for characteristics of (H or Q). Model M learnt from H and Q would pick up as few characteristics and properties of H and Q as possible as long as they can distinguish H to Q. Obviously the models would not present neither H nor Q in general, and therefore would lead to limited power in detecting false negatives and positives. In other words, using models learnt from binary classification methods to detect false negatives would give us limited success.

Assume M be the perfect model we want to build to find BMW owners, then

P(x|M) = 1 if x belongs to Q, and

E(P(x|M)) = N/|H| where x belongs to H and N is the number of BMW's owners in H.

So if we consider H and Q are the negative and positive sets for a classical binary classification, then the precision of M according to the training sets H and Q is |Q|/(|Q|+N). Note that it would be different from the target of most classification methods where they try to maximise precision. The effort to build or fit M to (H and Q) to obtain a high precision (as binary classification methods do) does more harm than good for the main target if their precision is significantly different to |Q|/(|Q|+N). This is particularly important when N is significantly larger than |Q|/(|Q|+N) (e.g., N = 9*|Q| ) as |Q|/(|Q|+N) may be significantly small. In short, a good binary classification is not what we aim to find here.

Although binary classification methods may not be designed for detecting false positives, and obviously would not give us enough power to detect false positives in H, they definitely give us good thrusts in the right direction of detecting false positives. A binary classification may not give us a high quality prediction for all false positives but as long as it helps to detect good chunks of false positives, we then can use Markov chain Monte Carlo technique to drag training data and models from the current state into a better state. In principle we can use the prediction produced by M to move both data and models into an equilibrium state. In practice, we may need a few steps to get a model with good power to detect false negative.

(to be continuous)

p/s: Laziness is the cause of all the delay.

Multiple steps solution

Quang Si LE, PhD

More articles by this author

Others also viewed

A comprehensive guide to different Performance Measures for Classification models

4 Common Mistakes to Avoid When Comparing LLM Performance

Hyperparameter Tuning: A Guide to Improving Model Performance

Here Are 4 Ways to Post-Train Frontier Models

Know Your Error, Know Your Models: Why Performance Assessment is More Than Just Accuracy

Model Generalization: Understanding Overfitting, Underfitting, and the Bias-Variance Tradeoff

Fitting a Straight Line: A Detailed Guide to Machine Learning for O&G Engineers

Confusion Matrix or its two types of error.

🚀 Maximizing Model Performance: Is the "Rolling Out-of-Sample" Technique the Missing Piece in Your Time-Series Validation?

Model Evaluation Technique

Explore content categories

ChatGPT, far to the future

Apr 30, 2023

Virtual threads, an Iceberg solution

Apr 16, 2023

Java: how can you leave reference const out?

Mar 25, 2023

Concurrency in C++

May 8, 2020

Concurrency in Java

Apr 25, 2020

Machine learning for fun

Dec 17, 2018

Machine learning and Artificial intelligent: not a new tool but weapon

Jan 10, 2018

Machine Learning: inevitable but expensive path for future business.

Mar 6, 2017