Regression VS Classification in Machine Learning
Machine learning is all about predicting the output based on the inputs given by the users. It deals with the input features which can be in any class or numeric form. For example, predicting if your bank will approve you loan or not. In this case, machine will ask you some of your important information regarding your banking transactions, previous loans, CIBIL score etc. and after analyzing all these factors it will make a prediction whether your loan will be approve or not.
Regression and Classification are the two primary tasks in machine learning which deals with two kind of problems.
What is Regression?
Regression is used for predicting the continuous values based on the input data. Continuous values means numbers that can take any values e.g. - 0.2323, 1.23, 4.23, 23, 43, 0.45 etc. This is used when you want to predict numbers such as income, height, weight, pricing etc.
Examples:
These all are predicting outputs in continuous numeric values so they fall under the category of regression.
ML Algorithms used for regression problems:-
(There are even more algorithms for regression problems but these five are considered most).
(Also don't worry if you are not aware of these algorithms before, I will explain each one of them in the upcoming articles)
What is Classification?
Classification is used when you want to categorize data into different classes or groups. For example, classifying emails as "spam" or "not spam" or predicting whether a patient has a certain disease based on their symptoms.
Problems like Yes/No, Pass/Fail, Healthy/Not healthy can be solved using classification algorithms.
Examples:
Recommended by LinkedIn
All these questions are asking to divide outputs based on the category so they fall under classification.
ML Algorithms used for Classification problems :
(I am going to discuss all these algorithms in details in the upcoming articles).
While we are talking about regression and classification, we also need to know graphical interpretation of these two. A concept is used to describe the difference between them- decision boundary (used in classification) and the best-fit line (used in regression).
Decision boundary means when a boundary is created among data points to classify them into categories or classes while best fit line means when algorithm tries to best fit the line to predict the continuous values.
In the above graph of classification, we can see a decision boundary is created which separates the data points into 2 groups. In Regression, we can a straight line tries to calculate the mean of every data point to predict the values in a continuous way.
💡 Quick Tip to Remember:
If your answer is a number, it’s probably regression. If your answer is a category, it’s probably classification.
That’s it for today’s concept! In the next article, I’ll try to cover some other topics — stay tuned!
Let’s learn ML one topic at a time 📚💻
Love this, Arpit. Helped a lot to now distinguish between regression and classification 👍 thank you☺️