Regression VS Classification

Regression VS Classification

Regression and Classification algorithms are Supervised Learning algorithms. Both the algorithms are used for prediction in Machine learning and work with the labeled datasets. But the difference between both is how they are used for different machine learning problems.

The main difference between Regression and Classification algorithms that Regression algorithms are used to predict the continuous values such as price, salary, age, etc. and Classification algorithms are used to predict/Classify the discrete values such as Male or Female, True or False, Spam or Not Spam, etc.


Classification

Classification is a process of finding a function which helps in dividing the dataset into classes based on different parameters. In Classification, a computer program is trained on the training dataset and based on that training, it categorizes the data into different classes.

The task of the classification algorithm is to find the mapping function to map the input(x) to the discrete output(y).

Example to understand classification:

Best example to understand the Classification problem is Email Spam Detection. The model is trained on the basis of millions of emails on different parameters, and whenever it receives a new email, it identifies whether the email is spam or not. If the email is spam, then it is moved to the Spam folder.

There are different types of Classification algorithms:

  • Logistic Regression
  • K-Nearest Neighbours
  • Support Vector Machines
  • Kernel SVM
  • Naïve Bayes
  • Decision Tree Classification
  • Random Forest Classification

In these above different types Classification algorithms can be sub-divided into different types based on the type of model.

In that case, there are two different types of Classification Algorithms:

  1. Linear Models

  • Logistic Regression
  • Support Vector Machines

2. Non-Linear Models

  • K-Nearest Neighbours
  • Kernel SVM
  • Naïve Bayes
  • Decision Tree Classification
  • Random Forest Classification

  1. K-Nearest Neighbours: The k-nearest neighbors (KNN) algorithm is a simple, supervised machine learning algorithm that can be used to solve both classification and regression problems.
  2. Kernet SVM: A kernel is a function used in SVM for helping to solve problems. They provide shortcuts to avoid complex calculations. The amazing thing about kernel is that we can go to higher dimensions and perform smooth calculations with the help of it. We can go up to an infinite number of dimensions using kernels.
  3. Naive Bayes: In statistics, naive Bayes classifiers are a family of simple "probabilistic Classifier" based on applying Bayes' Theorem with strong (naive) indepedence assumptions between the features (see Bayes Classifier). They are among the simplest Bayesian Network models, but coupled with kernel density estimation, they can achieve high accuracy levels.  

Evaluating Classification Models

Once our model is completed, it is necessary to evaluate its performance; either it is a Classification or Regression model. So for evaluating a Classification model, we have the following ways:

  1. Log Loss or Cross-Entropy Loss:

  • It is used for evaluating the performance of a classifier, whose output is a probability value between the 0 and 1.
  • For a good binary Classification model, the value of log loss should be near to 0.
  • The value of log loss increases if the predicted value deviates from the actual value.
  • The lower log loss represents the higher accuracy of the model.

2. Confusion Matrix:

  • The confusion matrix provides us a matrix/table as output and describes the performance of the model.
  • It is also known as the error matrix.
  • The matrix consists of predictions result in a summarized form, which has a total number of correct predictions and incorrect predictions. The matrix looks like as below table:
  • The matrix consists of predictions result in a summarized form, which has a total number of correct predictions and incorrect predictions. The matrix looks like as below table:

No alt text provided for this image



3. AUC-ROC curve:

  • ROC curve stands for Receiver Operating Characteristics Curve and AUC stands for Area Under the Curve.
  • It is a graph that shows the performance of the classification model at different thresholds.
  • To visualize the performance of the multi-class classification model, we use the AUC-ROC Curve.
  • The ROC curve is plotted with TPR and FPR, where TPR (True Positive Rate) on Y-axis and FPR(False Positive Rate) on X-axis.


Regression

Regression is a process of finding the correlations between dependent and independent variables. It helps in predicting the continuous variables such as prediction of Market Trends, prediction of House prices, etc.

Regression is a process of finding the correlations between dependent and independent variables. It helps in predicting the continuous variables such as prediction of Market Trends, prediction of House prices, etc.

Example to understand Regression:

 Suppose we want to do weather forecasting, so for this, we will use the Regression algorithm. In weather prediction, the model is trained on the past data, and once the training is completed, it can easily predict the weather for future days.

There are different types of Regression:

  • Simple Linear Regression
  • Multiple Linear Regression
  • Polynomial Regression
  • Support Vector Machines
  • Decision Tree Regression
  • Random Forest Regression

  1. Linear Regression:

Linear Regression is an ML algorithm used for supervised learning. Linear regression performs the task to predict a dependent variable(target) based on the given independent variable(s). So, this regression technique finds out a linear relationship between a dependent variable and the other given independent variables. Hence, the name of this algorithm is Linear Regression.

2. Decision Tree:

The decision tree models can be applied to all those data which contains numerical features and categorical features. Decision trees are good at capturing non-linear interaction between the features and the target variable. Decision trees somewhat match human-level thinking so it’s very intuitive to understand the data.

3. Support Vector Regression:

You must have heard about SVM i.e., Support Vector Machine. SVR also uses the same idea of SVM but here it tries to predict the real values. This algorithm uses hyperplanes to segregate the data. In case this separation is not possible then it uses kernel trick where the dimension is increased and then the data points become separable by a hyperplane.

4. Lasso Regression:

  • LASSO stands for Least Absolute Selection Shrinkage Operator. Shrinkage is basically defined as a constraint on attributes or parameters.
  • The algorithm operates by finding and applying a constraint on the model attributes that cause regression coefficients for some variables to shrink toward a zero.
  • Variables with a regression coefficient of zero are excluded from the model.
  • So, lasso regression analysis is basically a shrinkage and variable selection method and it helps to determine which of the predictors are most important.

5. Random Forest Regressor:

Random Forests are an ensemble(combination) of decision trees. It is a Supervised Learning algorithm used for classification and regression. The input data is passed through multiple decision trees. It executes by constructing a different number of decision trees at training time and outputting the class that is the mode of the classes (for classification) or mean prediction (for regression) of the individual trees.

There are many different algorithms in the Machine Learning paradigm and we can use these to understand the data we have at hand very well and also use them to make models which will automate our work.
















Good one Udbhav! Very informative..

Like
Reply

To view or add a comment, sign in

More articles by Udbhav G.

  • DBSCAN Algorithm

    DBSCAN is a type of clustering algorithm. In clustering, a group of different data objects is classified as similar…

  • Functors, Applicatives & Monads

    Monad, applicative functor, and functor are just functional programming patterns you can use to deal with effects like…

  • EDUCATION MANAGEMENT SYSTEM

    SKILL DEVELOPMENT PROJECT - 2 OUR BATCH NO:- 295 OUR TEAM:- 190031830 Sai Jagat Udbhav Govindu 190031415 Jahanvi…

    1 Comment

Others also viewed

Explore content categories