Playing with Features

Nishant kumar singh

Published Feb 4, 2018

Here are few methodology used to play with features in any data-set:

1. Feature Engineering: This is probably the broadest term, encompassing most of the others. Feature engineering can be carried out by manual or automated means, and be based on the selection of original characteristics or the construction of new ones through transformations.

2. Feature Learning: It is used when the process to select among the existing features or construct new ones is automated. Thus, we can perform both feature selection and feature extraction through algorithms. Despite the use of automatic methods, sometimes an expert is needed to decide which algorithm is the most appropriate depending on data traits, to evaluate the optimum amount of variables to extract, etc.

3. Representation Learning: Although this term is sometimes interchangeably used with the previous one, it is mostly used to refer to the use of ANNs to fully automate the feature generation process. Applying ANNs to learn distributed representations of concepts was proposed by Hinton. Today, learning representations are mainly linked to processing natural language, images and other signals with specific kinds of ANNs, such as CNNs.

4. Feature Selection: Picking the most informative subset of variables started as a manual process usually in charge of domain experts. It can be considered a special case of feature weighting. Although in certain fields the expert is still an important factor, nowadays the selection of variables is usually carried out using computer algorithms. These can operate in supervised or unsupervised manner. Feature selection is overall an essential strategy in the data preprocessing phase.

5. Feature Extraction: The goal of this technique is to find a better data representation for the machine learning algorithm intended to use since the original representation might not be the best one. It can be faced both manually, in which case the feature construction term is of common use, and automatically. Some elemental techniques such as normalization, discretization or scaling of variables, as well as basic transformations applied to certain data types, are also considered within this field.

6. Feature Fusion: This more recent term has emerged with the growth of multimedia data processing by machine learning algorithms, especially images, text, and sound. Feature fusion methods aim to combine variables to remove redundant and irrelevant information. Manifold learning algorithms, and especially those based on ANNs, fall into this category.

Playing with Features

Nishant kumar singh

More articles by Nishant kumar singh

Explore content categories

More articles by Nishant kumar singh

Overview of Seq2Seq Learning

Overview of CatBoost

Explore content categories