Overview of Classification and Prediction Models
Supervised models are the most common models in data mining and machine learning. The aim of these models is to predict an event or estimate the values of a continuous numeric attribute. Supervised models which learn from past cases have input variables (x) and an output or target field (Y). Inputs are also called “predictors” due to their role of identification of a prediction function in the algorithm. Then, an input-output “mapping function” links input data patterns with the outcome and permits the prediction of the output values.
Y = f(X)
The process of supervised models comprises of two main phases. In the training phase, input data patterns are associated with specific outcomes. Furthermore, the model learning from the training dataset is thought of as a teacher supervising the learning process. Therefore, we call these models “Supervised Models”. As soon as the function is built and the relationship is defined, the scoring phase starts. The algorithm iteratively makes predictions on the training data and is corrected by a supervised algorithm.
Supervised models are divided into further categories according to their scope and the measurement level of the prediction ability:
1-Classification or propensity models
2-Estimation (regression) models
3-Feature Selection
In this article, I will focus on classification or propensity models and their popular algorithms.
CLASSIFICATION / PROPENSITY MODELS
These models are used for predicting categorical outcomes with pre-classified data. Subsequently, generated models predict the occurrence and classify unseen records. This algorithm approximates a propensity (confidence) score for each new record and this score indicates the likelihood of the objective. In general, this score ranges from 0 to 1.
This algorithm is commonly used in finding a likelihood of a class for an event; such as the propensity of a customer to churn, to buy a particular add-on product, or to default on his loan.
· Churned: yes/no
· Defaulted: yes/no
As an example of the use of classification models in CRM, this algorithm enables companies to target their customers and optimally customise their campaigns according to the customers with the relatively higher probabilities.
Some of the popular classification algorithm:
1-DECISION TREES:
These algorithms are popular due to their transparency. Their goal is to create pure subsegments. In this model, the dataset is divided into two or more uniform sets based on the most significant splitter. There are two main types of decision trees: regression trees and classification trees. While the variables are categorical and discrete in classification trees, they are continuous in regression trees. Thus, they are likely to handle both numerical and categorical data.
They consist of nodes, branches and leaf nodes. Decision nodes or leaf nodes indicate the result. The deeper the tree, the more complex the decision rules and the fitter the model.
This model is easy to understand and good for data exploration.
Figure 1 Decision tree examples (Chakure 2019)
2-DECISION RULES:
These algorithms are very similar to decision trees, but they provide a list of rules. They consist of a simple IF-THEN statement with a condition and a prediction. The main difference between decision trees and decision rules is that decision rules might have multiple rules for each record. Therefore, they are likely to have overlapping rules. For multiple rules, they have two main strategies; decision lists and decision sets. A decision list introduces order to the decision rules. If the condition of the first rule is true for an instance, we use the prediction of the first rule. If not, we go to the next rule and check if it applies and so on. Decision lists solve the problem of overlapping rules by only returning the prediction of the first rule in the list that applies A decision set resembles a democracy of the rules, except that some rules might have higher voting power (Molnar 2020).
3-LOGISTIC REGRESSION:
This well-established statistical model is used to describe data and to explain the relationship between one dependent binary variable and one or more nominal, ordinal, interval or ratio-level independent variables. It is commonly used in social science applications. To give examples; to predict whether an email is spam (1) or (0) or in case of a binary target denoting churn.
Their results have the form of the continuous function which predicts the probabilities of the target classes. The coefficients are the effect of the predictors.
Figure 2 Logistic Regression Function (Grace-Martin 2017)
Different regression models differ based on the kind of relationship between the dependent and independent variables, they are considering and the number of independent variables being used. For example; the fundamental difference between linear and logistic regression is that Logistic regression is used when the dependent variable is binary in nature. In contrast, Linear regression is used when the dependent variable is continuous and the nature of the regression line is linear. Linear Regression is all about fitting a straight line in the data while Logistic Regression is about fitting a curve to the data.
3-NEURAL NETWORK:
A neural network is a class of models within machine learning. It uses complex, nonlinear mapping functions for prediction and classification. The model predicts the weights that link the input neurons which are parts of the input layer to the target in the output layer. Some complex models might include hidden and intermediate layers. After the iterative training process, the input is presented to the network and the model estimation is examined. Required adjustments are made and the initial weight estimation is optimized.
Figure 3 Neural Network Layers (Ahmedian & Khanteymoori 2015)
This algorithm works best with more data points and once trained; it predicts very fast. Moreover, it is trained with any number of inputs and layers.
4-SUPPORT VECTOR MACHINE
This classification model is used for classification, regression and outliers detection. It works with nonlinear complex data patterns and it prevents overfitting. SVM works by mapping data to a high-dimensional feature space in which records become more easily separable with respect to the target classes (Charianopoulos 2016). Input training data are appropriately transformed through nonlinear kernel functions and then a linear function follows to classify the cases in an optimal way. In this separation process, the model finds a hyperplane which maximizes the margin distance in an N-dimensional space. SVM algorithms have low transparency due to the lack of prediction explanation.
Figure 4 Classification of data by support vector machine (Nieto 2016)
5-BAYESIAN NETWORKS
Bayesian networks are a type of probabilistic graphical models based on Bayes theorem. They provide a visual representation of the attribute relationship and explain the rationality of the model. In the model, the probability of pertaining to each target class.
Figure 5 Bayesian Network Example ( Ju 2018)
REFERENCES
Grace-Martin, K., 2017. What Is A Logit Function And Why Use Logistic Regression? - The Analysis Factor. [online] The Analysis Factor. Available at: <https://www.theanalysisfactor.com/what-is-logit-function/> [Accessed 21 September 2020].
García-Gonzalo, E., Fernández-Muñiz, Z., García Nieto, P., Bernardo Sánchez, A. and Menéndez Fernández, M., 2016. Hard-Rock Stability Analysis for Span Design in Entry-Type Excavations with Learning Classifiers. Materials, 9(7), p.531.
Scikit-learn.org. 2020. 1.4. Support Vector Machines — Scikit-Learn 0.23.2 Documentation. [online] Available at: <https://scikit-learn.org/stable/modules/svm.html> [Accessed 20 September 2020].
Medium. 2020. Decision Tree For Classification With Example And Why Or Why Not We Use Them.. [online] Available at: <https://medium.com/@rdhawan201455/decision-tree-for-classification-with-example-and-why-or-why-not-we-use-them-296d533a91eb> [Accessed 20 September 2020].
Medium. 2020. Introduction To Bayesian Networks. [online] Available at: <https://towardsdatascience.com/introduction-to-bayesian-networks-81031eeed94e> [Accessed 17 September 2020].
Medium. 2020. Decision Tree For Classification With Example And Why Or Why Not We Use Them.. [online] Available at: <https://medium.com/@rdhawan201455/decision-tree-for-classification-with-example-and-why-or-why-not-we-use-them-296d533a91eb> [Accessed 20 September 2020].
Jihongju.github.io. 2020. Representation - Bayesian Networks - Jihong Ju's Blog. [online] Available at: <https://jihongju.github.io/2018/11/11/pgm-lecture-note-01/> [Accessed 21 September 2020].
Interesting! Thanks for sharing!