Multiple Linear Regression
Hello everyone today we learn about machine learning algorithm. As heading suggest we learn about Multiple linear regression. A linear regression simply shows the relationship between the dependent variable and the independent variable.
- It comes under the supervised learning and It is an extension of the Simple Linear Regression.
- Two or more Independent variables (x1, x2, …xn) are used to predict or explain the variance in Y – the dependent variable.
We work on a dateset name as hiring.csv This file contains hiring statics for a firm such as experience of candidate, his written test score and personal interview score. Based on these 3 factors. By this data set we build a modal, modal will be predict the salary of candidate. Using this predict salaries for following candidates, Candidate 1.) 2 yr experience, 9 test score, 6 interview score Candidate 2.)12 yr experience, 10 test score, 10 interview score Candidate 3.)15 yr experience, 9 test score, 9 interview score.
First we import some Library files like numpy, pandas, sklearn.
- Numpy - This library used for numpy used for perform a number of mathematical operations on arrays such as trigonometric, statistical and algebraic routines.
- Pandas - pandas used for data analysis and it provides highly optimized performance.
- Sklearn - sklearn provide some built in function like train_test_split by using this we divide our data set in two part first train and second for testing.
import pandas as pd from sklearn.linear_model import LinearRegression from sklearn.model_selection import train_test_split
import numpy as np
- Here we read our csv file by the function pd.read_csv()
- Now we print data types in data frame, data type column wise show in image.
- Now we print keys of our data frame
- Then we print df
- Here we convert word to number and store in the cleanup_nums
- then pass it in the function df.replace() and print df
- Now we replace null(NaN) values from the df.
- In column "experience" we fill 0 by the place of NaN.
- And in "test_score(out of 10)" fill NaN by the mean of the test_score(out of 10).
- We use Linear Regression function to fit our model and we store in the variable named as model.
- Now we fit it by passing argument x, y
- Now we find y_pred by formula y = mx+c
- And by the defined function model.predict
- And by this we find the salary of Candidate 1,2,3. Here we have a function named as hiring in this we pass experience,test_score,interview_score.
- Now we pass all these argument in the y_pred and then print it.