Playing with Features

Here are few methodology used to play with features in any data-set:

1.    Feature Engineering: This is probably the broadest term, encompassing most of the others. Feature engineering can be carried out by manual or automated means, and be based on the selection of original characteristics or the construction of new ones through transformations.

2.    Feature Learning: It is used when the process to select among the existing features or construct new ones is automated. Thus, we can perform both feature selection and feature extraction through algorithms. Despite the use of automatic methods, sometimes an expert is needed to decide which algorithm is the most appropriate depending on data traits, to evaluate the optimum amount of variables to extract, etc.

3.    Representation Learning: Although this term is sometimes interchangeably used with the previous one, it is mostly used to refer to the use of ANNs to fully automate the feature generation process. Applying ANNs to learn distributed representations of concepts was proposed by Hinton. Today, learning representations are mainly linked to processing natural language, images and other signals with specific kinds of ANNs, such as CNNs.

4.    Feature Selection: Picking the most informative subset of variables started as a manual process usually in charge of domain experts. It can be considered a special case of feature weighting. Although in certain fields the expert is still an important factor, nowadays the selection of variables is usually carried out using computer algorithms. These can operate in supervised or unsupervised manner. Feature selection is overall an essential strategy in the data preprocessing phase.

5.    Feature Extraction: The goal of this technique is to find a better data representation for the machine learning algorithm intended to use since the original representation might not be the best one. It can be faced both manually, in which case the feature construction term is of common use, and automatically. Some elemental techniques such as normalization, discretization or scaling of variables, as well as basic transformations applied to certain data types, are also considered within this field.

6.    Feature Fusion: This more recent term has emerged with the growth of multimedia data processing by machine learning algorithms, especially images, text, and sound. Feature fusion methods aim to combine variables to remove redundant and irrelevant information. Manifold learning algorithms, and especially those based on ANNs, fall into this category.

To view or add a comment, sign in

More articles by Nishant kumar singh

  • Overview of Seq2Seq Learning

    There is a lot of buzz regarding Chatbots and every second commercial company is launching or trying to launch on their…

  • Overview of CatBoost

    CatBoost is an algorithm that belongs to the family of Gradient Boosting Decision Trees in which Xgboost, Adaboost etc.…

    1 Comment

Explore content categories