MACHINE LEARNING ALGORITHMS - Regression - Part 1 of 12
REGRESSION :
- OLSR - Ordinary Least Square Regression
In statistics, ordinary least squares (OLS) or linear least squares is a method for estimating the unknown parameters in a linear regression model, with the goal of minimizing the differences between the observed responses in some arbitrary dataset and the responses predicted by the linear approximation of the data (visually this is seen as the sum of the vertical distances between each data point in the set and the corresponding point on the regression line - the smaller the differences, the better the model fits the data). The resulting estimator can be expressed by a simple formula, especially in the case of a single regressor on the right-hand side.
The OLS estimator is consistent when the regressors are exogenous and there is no perfect multicollinearity, and optimal in the class of linear unbiased estimators when the errors are homoscedastic and serially uncorrelated. Under these conditions, the method of OLS provides minimum-variance mean-unbiased estimation when the errors have finite variances. Under the additional assumption that the errors be normally distributed, OLS is the maximum likelihood estimator. OLS is used in economics (econometrics), political science and electrical engineering (control theory and signal processing), among many areas of application. The Multi-fractional order estimator is an expanded version of OLS.
LINEAR REGRESSION :
Linear regression is the most basic and commonly used predictive analysis. Regression estimates are used to describe data and to explain the relationship between one dependent variable and one or more independent variables.
At the center of the regression analysis is the task of fitting a single line through a scatter plot. The simplest form with one dependent and one independent variable is defined by the formula y = c + b*x, where y = estimated dependent, c = constant, b = regression coefficients, and x = independent variable.
Sometimes the dependent variable is also called a criterion variable, endogenous variable, prognostic variable, or regressand. The independent variables are also called exogenous variables, predictor variables or regressors.
However linear regression analysis consists of more than just fitting a linear line through a cloud of data points. It consists of 3 stages – (1) analyzing the correlation and directionality of the data, (2) estimating the model, i.e., fitting the line, and (3) evaluating the validity and usefulness of the model.
There are 3 major uses for regression analysis – (1) causal analysis, (2) forecasting an effect, (3) trend forecasting. Other than correlation analysis, which focuses on the strength of the relationship between two or more variables, regression analysis assumes a dependence or causal relationship between one or more independent and one dependent variable.
Firstly, it might be used to identify the strength of the effect that the independent variable(s) have on a dependent variable. Typical questions are what is the strength of relationship between dose and effect, sales and marketing spend, age and income.
Secondly, it can be used to forecast effects or impacts of changes. That is regression analysis helps us to understand how much will the dependent variable change, when we change one or more independent variables. Typical questions are how much additional Y do I get for one additional unit X.
Thirdly, regression analysis predicts trends and future values. The regression analysis can be used to get point estimates. Typical questions are what will the price for gold be in 6 month from now? What is the total effort for a task X?
There are several linear regression analyses available to the researcher.
• Simple linear regression
1 dependent variable (interval or ratio), 1 independent variable (interval or ratio or dichotomous)
• Multiple linear regression
1 dependent variable (interval or ratio) , 2+ independent variables (interval or ratio or dichotomous)
• Logistic regression
1 dependent variable (binary), 2+ independent variable(s) (interval or ratio or dichotomous)
• Ordinal regression
1 dependent variable (ordinal), 1+ independent variable(s) (nominal or dichotomous)
• Multinominal regression
1 dependent variable (nominal), 1+ independent variable(s) (interval or ratio or dichotomous)
• Discriminant analysis
1 dependent variable (nominal), 1+ independent variable(s) (interval or ratio)
When selecting the model for the analysis another important consideration is the model fitting. Adding independent variables to a linear regression model will always increase the explained variance of the model (typically expressed as R²). However adding more and more variables to the model makes it inefficient and over fitting occurs. Occam's razor describes the problem extremely well – a model should be as simple as possible but not simpler. Statistically if the model includes a large number of variables the probability increases that the variables test statistically significant out of random effects.
The second concern of regression analysis is under fitting. This means that the regression analysis' estimates are biased. Under fitting occurs when including an additional independent variable in the model will reduce the effect strength of the independent variable(s). Mostly under fitting happens when linear regression is used to prove a cause-effect relationship that is not there. This might be due to researcher's empirical pragmatism or the lack of a sound theoretical basis for the model.
Logistic Regression :
Logistic regression is the appropriate regression analysis to conduct when the dependent variable is dichotomous (binary). Like all regression analyses, the logistic regression is a predictive analysis. Logistic regression is used to describe data and to explain the relationship between one dependent binary variable and one or more metric (interval or ratio scale) independent variables.
Standard linear regression requires the dependent variable to be of metric (interval or ratio) scale. How can we apply the same principle to a dichotomous (0/1) variable? Logistic regression assumes that the dependent variable is a stochastic event. That is that for instance if we analyze a pesticides kill rate the outcome event is either killed or alive. Since even the most resistant bug can only be either of these two states, logistic regression thinks in likelihoods of the bug getting killed. If the likelihood of killing the bug is > 0.5 it is assumed dead, if it is < 0.5 it is assumed alive.
The outcome variable – which must be coded as 0 and 1 – is placed in the first box labeled Dependent, while all predictors are entered into the Covariates box (categorical variables should be appropriately dummy coded). SPSS predicts the value labeled 1 by default, so careful attention should be paid to the coding of the outcome (usually it makes more sense to examine the presence of a characteristic or “success”. Sometimes instead of a logit model for logistic regression a probit model is used. The following graph shows the difference for a logit and a probit model for different values (-4, 4). Both models are commonly used in logistic regression, in most cases a model is fitted with both functions and the function with the better fit is chosen. However, probit assumes normal distribution of the probability of the event, when logit assumes the log distribution.
Stepwise Regression Model
Stepwise Regression model is a step-by-step iterative construction of a regression model. It is semi-automatic selection process of independent variables carried out in two ways – by including independent variables in the regression model one by one at a time if they are statistically significant, or by including all the independent variables initially and then removing them one by one if they prove to be statistically insignificant.
The stepwise regression model is a much more powerful tool than other multiple regression models and come in handy when working with a large number of potential independent variables and/or fine-tuning a model by selecting variables in or out.
The major approaches to stepwise regression model are as follows:
- Forward Selection – starting with no variables initially, testing the addition of a new variable, and adding the variable if proves to improve the model
- Backward Elimination – starting with all the variables initially, testing the elimination of a variable, and eliminating the variable if proves to improve the model
- Bidirectional Elimination – combination of the above
Ref Example : http://www.geog.leeds.ac.uk/courses/other/statistics/spss/stepwise/
Multivariate adaptive regression splines
In statistics, multivariate adaptive regression splines (MARS) is a form of regression analysis introduced by Jerome H. Friedman in 1991. It is a non-parametric regression technique and can be seen as an extension of linear models that automatically models nonlinearities and interactions between variables.
The term "MARS" is trademarked and licensed to Salford Systems. In order to avoid trademark infringements, many open source implementations of MARS are called "Earth".
The MARS model
MARS builds models of the form
f(x) = k summation i=1 c(i)B(i) (x)
The model is a weighted sum of basis functions . Each is a constant coefficient. For example, each line in the formula for ozone above is one basis function multiplied by its coefficient.
Each basis function takes one of the following three forms:
1) a constant 1. There is just one such term, the intercept.
2) a hinge function. A hinge function has the form or . MARS automatically selects variables and values of those variables for knots of the hinge functions.
3) a product of two or more hinge functions. These basis functions can model interaction between two or more variables.
Detail Ref : http://www.slac.stanford.edu/pubs/slacpubs/4750/slac-pub-4960.pdf
Definition of a LOESS model
LOESS, originally proposed by Cleveland (1979) and further developed by Cleveland and Devlin (1988), specifically denotes a method that is also known as locally weighted polynomial regression. At each point in the range of the data set a low-degree polynomial is fitted to a subset of the data, with explanatory variable values near the point whose response is being estimated. The polynomial is fitted using weighted least squares, giving more weight to points near the point whose response is being estimated and less weight to points further away. The value of the regression function for the point is then obtained by evaluating the local polynomial using the explanatory variable values for that data point. The LOESS fit is complete after regression function values have been computed for each of the data points. Many of the details of this method, such as the degree of the polynomial model and the weights, are flexible.
Ref : http://www.ats.ucla.edu/stat/sas/library/loesssugi.pdf
Jackknife Resampling
In statistics, the jackknife is a resampling technique especially useful for variance and bias estimation. The jackknife predates other common resampling methods such as the bootstrap. The jackknife estimator of a parameter is found by systematically leaving out each observation from a dataset and calculating the estimate and then finding the average of these calculations. Given a sample of size , the jackknife estimate is found by aggregating the estimates of each estimate in the sample.
The jackknife technique was developed by Maurice Quenouille (1949, 1956). John Tukey (1958) expanded on the technique and proposed the name "jackknife" since, like a Boy Scout's jackknife, it is a "rough and ready" tool that can solve a variety of problems even though specific problems may be more efficiently solved with a purpose-designed tool.
The jackknife is a linear approximation of the bootstrap.
Estimation
The jackknife estimate of a parameter can be found by estimating the parameter for each subsample omitting the i th observation to estimate the previously unknown value of a parameter (say ).
x bar i = 1/n-1 n summation j not eq i (x base j)
Variance Estimation
An estimate of the variance of an estimator can be calculated using the jackknife technique.
where is the parameter estimate based on leaving out the i th observation, and is the estimator based on all of the subsamples.
Var(jackknife)= n-1/n n summation i=1 (x bar i - x bar (.)) square
Ref. : https://www.utdallas.edu/~herve/abdi-Jackknife2010-pretty.pdf
For part - II please click :
https://www.garudax.id/pulse/machine-learning-algorithm-regularization-part-2-12-abhay-kumar?trk=mp-author-card