Machine Learning - The shallow end of the algorithm pool, linear regression
The Machine Learning Guide podcast by Tyler Renelle and the Machine Learning course on Coursera taught by Andrew Ng allows those new to Machine Learning to quickly grasp concepts once thought reserved for academics in white lab coats. A very nice way to ease into learning is to use linear regression to allow you to predict values based upon features of a data set.
Imagine having a listing of homes recently sold loaded into a spreadsheet. The matrix would have a column for the number of floors in a home, the number of square feet in the home, it's age and the selling price. Using this data and the linear regression algorithm, one can "teach" a computer how to learn to predict prices based on this data.
Below is an overly simplified model which is used to represent the process and concepts used to do this with the example scenario illustrated.
Predicting
Essentially a coefficient is choosen at random to be associated with each attribute (# of floors, square footage, age of home). The coefficients and the attribute values are multiplied and the sum of each attribute/coefficient pair are then added together. The computer then uses this sum to arrive at a predicted value for the price of a home.
Measuring
The program then looks at the predicted value compared to the actual value (price of home in this example). The difference is said to be the "Cost" or "Error Rate" for the chosen coefficient.
Adjusting
The program then uses an algorithm to take into account the error of it's prediction and adjusts the coefficient values slightly. The learning rate determines how aggressively adjustments are made. This is made possible by the magic of calculus.
Lather, rinse, repeat
The process is repeated iteratively until the measured error rate differences are so small to be insignificant. At this point the coefficient values have become your model to predict new home prices within a degree of error determined by the "Cost" value.
Adjusted to abstract to any data set with formulas:
Obviously there is a lot more detail than illustrated in this short example, but it should be a good summary to help guide you through the details. For more in depth explanations as well as instruction on how to actually write a program that does this, I suggest the following:
- The Machine Learning Guide podcast by Tyler Renelle - The link takes you directly to the podcast about linear regression.
- Machine Learning course on Coursera taught by Andrew Ng - The link takes you to the Coursera page for the course and linear regression is one of the first concept covered.
Brad, follow Feedzai. They are leaders in machine learning in the financial sector.