To Be or Not To Be a Feature

Pranab Ghosh

Published Aug 3, 2015

In many prediction problems, you may be overwhelmed with many input or feature variables. Many of the feature variables may be irrelevant for the prediction. There are many techniques to select the relevant set of features among all. This kind of pre processing work before building a machine learning model is known as feature engineering.

Proper feature selection is a critical step towards building a good learning model. In this post, my focus will be on feature selection by assigning scores based on various statistical measure for each feature variable.

Feature Reduction

In feature reduction, we extract m dimension out of n dimensions, where m is less than n. This is generally accomplished with a technique called Principal Component Analysis (PCA).

In PCA, a new set of variables are derived as a function of the original set of feature variables. The new set of variables are highly uncorrelated to each other.

For prediction problems, PCA is not very effective, because it does not take into account the output or class variable.

Feature Selection

In feature selection, we select a sub set of the original feature set. One brute force way to select feature subset is to try all possible subsets with the learning algorithm and choose the one that causes minimum error. Intuitively, a good feature variable should have the following characteristics.

A good feature variable will be highly uncorrelated with other feature variables and highly correlated with the output variable.

The procedure for feature sub set selection is as follows. All features are assigned scores using statistical measures based on entropy and mutual information. The features are ranked by the score and the top k features are selected.

Entropy is a measure of randomness of a variable. Mutual information is a measure of the mutual dependence between two variables.

In my OSS project avenir, I have a Hadoop based implementation of five statistical measures based on entropy and mutual information. They are as follows.

Mutual Information Maximization (MIM)
Mutual Information Feature Selection (MIFS)
Joint Mutual Information (JMI)
Double Input Symmetrical Response (DISR)
Min Redundancy Max relevance (MRMR)

Details of these techniques can be found in my post. In my post, I have used hospital readmission as a classification use case with 10 feature variables.

It's not easy to decide which among these techniques will work best. One option is to select top k features by each of these techniques and run the learning algorithm on all of them. The feature sub set selection technique giving rise to minimum error should be the one selected.

Some Examples

Here are some examples along with heuristics to apply to decide whether a feature variable should retained for building the prediction model

Has very little variance : Discard it. It's correlation with the output variable will be weak
Has strong correlation with output variable and weak correlation with other feature variables : Retain it
Two variables are strongly correlated with each other and strongly correlated with the output variable: Retain one and discard the other

Finally

Even if you are not doing feature analysis for building a learning model, using these techniques will give you valuable insights into the feature variables of a problem.

Shashi Sathyanarayana, Ph.D 10y

Thank you, Pranab Ghosh. As always, your articles are illuminating and your headings are entertaining. I will look forward to a future article of yours in which you will write about other methods increasingly used nowadays to simultaneously derive good features and train a classifier.

To view or add a comment, sign in

To Be or Not To Be a Feature

Pranab Ghosh

Feature Reduction

Feature Selection

Some Examples

Finally

More articles by Pranab Ghosh

Others also viewed

What the Heck are Gaussian Processes?

Data Transformation: one-hot vs target encoding

Unveiling the Power of Multiclass Classification (Part 9)

Machine Learning Assisted Data Exploration

K-MEANS Clustering in Big Data Analytics(BDA)

Machine Learning - ‘Sherlock Holmes’ of the Data Science world

How do I get started with ML project

Data Science Simplified Part 4: Simple Linear Regression Models

With significant project time being spent on data preparation, is it that significant?

GRADING YOUR MACHINE LEARNING PREDICTIONS

The Role Of Feature Engineering In Predictive Analytics

Best Practices For Evaluating Predictive Analytics Models

Machine Learning Models For Healthcare Predictive Analytics

How to Optimize Machine Learning Performance

Tips for Machine Learning Success

Explore content categories

Feature Reduction

Feature Selection

Some Examples

Finally

More articles by Pranab Ghosh

Does AutoML make Data Scientists obsolete? Not so fast.

Perishable Product Discounting with Reinforcement Learning

Quick and Easy Sentiment Analysis using Google Search Result size and Mutual Information

Black Box Machine Learning may be harmful

Essential Differences between Deep Learning and Conventional Neural Network

Big Data ETL Does Not Have to Cost Big Bucks

The Amazing Power of Generalization

Sometimes the Only Path to Survival is Big Data

Prescriptive Analytics is Predictive Analytics Inverted

When Approximation is Good Enough

Others also viewed

What the Heck are Gaussian Processes?

Data Transformation: one-hot vs target encoding

Unveiling the Power of Multiclass Classification (Part 9)

Machine Learning Assisted Data Exploration

K-MEANS Clustering in Big Data Analytics(BDA)

Machine Learning - ‘Sherlock Holmes’ of the Data Science world

How do I get started with ML project

Data Science Simplified Part 4: Simple Linear Regression Models

With significant project time being spent on data preparation, is it that significant?

GRADING YOUR MACHINE LEARNING PREDICTIONS

Similar topics

The Role Of Feature Engineering In Predictive Analytics

Best Practices For Evaluating Predictive Analytics Models

Machine Learning Models For Healthcare Predictive Analytics

How to Optimize Machine Learning Performance

Tips for Machine Learning Success

Explore content categories