What is Supervised Learning?
In supervised learning, you train a machine using "labelled" data. It means some information is already tagged with the correct answer. It can be compared to learning in the presence of a supervisor or teacher. Supervised learning algorithms learn from labelled training data to help predict outcomes for unexpected data. Successfully & Effectively building, scaling, and deploying precise supervised machine learning models requires the time and technical expertise of a team of highly experienced data scientists. Additionally, data scientists must rebuild their models so that the insights provided remain true until the data changes.
How does it work?
Suppose a number of sets N for example of the form x1,y1,…,xN,yN. Where xi is the feature vector of the ith example and Yi is its label. A leaning algorithm seeks a function g:X→Y,
X is the input space and Y is the output space. The function g is an element of some space of possible function G(also called hypothesis space). Normally, it is convenient to represent g using the scoring function f:X*Y→R however, g is defined as returning the "y" value that gives the highest score:gx=arg f(x,y),...
f denotes as the space of the scoring function. Although "G" & "F" can be any functions, many learning algorithms are probabilistic models where G takes the form of a conditional probability model. Here (Gx) = (Px) or "f" takes the form of joint probability model fx,y=P(x,y)
For example, naive Bayes and linear discriminant analysis are joint probability models, whereas logistic regression is a conditional probability model.
The 2 basic approaches in choosing "f" or "g", structural risk minimization, and empirical risk minimization. Empirical risk minimization goes for the function that best fits the training data. Structural risk minimization includes a penalty function that controls the bias or variance tradeoff.
In both cases, it is assumed that the training set consists of a sample of independent and identically distributed pairs(xi, yi). To measure how well a function fits the training data, a loss function L: Y*Y→R≥0 is defined. For training example (xi,yi) the loss of predicting the value y^ is L(yi,y^ )
The risk R(g) of function g is defined as the expected loss of "g". Estimated from the formula of training data as follows
Supervised Learning is applicable for?
- Bioinformatics
- Cheminformatics
- Quantitative structure-activity relationship
- Database marketing
- Handwriting recognition
- Information retrieval
- Learning to rank
- Information extraction
- Object recognition in computer vision
- Optical character recognition
- Spam detection
- Pattern recognition
- Speech recognition
Advantages:
- You can collect data through supervised learning or generate data output from previous experiences.
- Use experience to help you optimize your performance criteria.
- Supervised machine learning helps solve many types of real-world computational problems.
Disadvantages:
- For training sets where there are no examples you want to include in your class, decision boundaries can be overlearned.
- While training the classifier, you have to pick a lot of good examples from each class.
- Classifying big data is difficult.
- Training for supervised learning takes a lot of computation time.
Thank you so much sir