Intro to Machine Learning
Machine Learning is the field of study that gives computer the ability to learn.
There are many types of ML systems, that is too useful to classify them in broad categories, based on the following criteria:
In this article we will discuss only first type.
In supervised learning, the training set you feed to the algorithm includes the desired
solutions, called labels.
A typical supervised learning task is classification:
The spam filter is a good example of this: it is trained with many example emails along with their class (spam or ham), and it must learn how to classify new emails.
Another typical task is to predict a target numeric value, such as the price of a car,
given a set of features (mileage, age, brand, etc.) called predictors.
This sort of task is called regression To train the system, you need to give it many examples
of cars, including both their predictors and their labels (i.e., their prices).
Note: that some regression algorithms can be used for classification as well, and vice
versa. For example, Logistic Regression is commonly used for classification.
Here are some of the most important supervised learning algorithms:
• Linear Regression.
• Logistic Regression.
• Support Vector Machines (SVMs).
• Decision Trees and Random Forests.
• Neural networks.
2. Unsupervised:
In unsupervised learning, as you might guess, the training data is unlabeled. The system tries to learn without a teacher.
Recommended by LinkedIn
For example, say you have a lot of data about your blog’s visitors. You may want to
run a clustering algorithm to try to detect groups of similar visitors.
At no point do you tell the algorithm which group a visitor belongs to: it finds those connections without your help.
For example, it might notice that 40% of your visitors are males who love comic books and generally read your blog in the evening, while 20% are young sci-fi lovers who visit during the weekends. If you use a hierarchical clustering algorithm, it may also subdivide each group into smaller groups. This may help you target your posts for each group.
3. Semi Supervised:
Since labeling data is usually time-consuming and costly, you will often have plenty of
unlabeled instances, and few labeled instances. Some algorithms can deal with data
that’s partially labeled.
Some photo-hosting services, such as Google Photos, are good examples of this. Once
you upload all your family photos to the service, it automatically recognizes that the
same person A shows up in photos 1, 5, and 11, while another person B shows up in
photos 2, 5, and 7. This is the unsupervised part of the algorithm (clustering). Now all
the system needs is for you to tell it who these people are. Just add one label per person
and it is able to name everyone in every photo, which is useful for searching photos.
4. Reinforcement Learning:
Reinforcement Learning is a very different beast. The learning system, called an agent
in this context, can observe the environment, select and perform actions, and get
rewards in return (or penalties in the form of negative rewards).
It must then learn by itself what is the best strategy, called a policy, to get the most reward over time.
A policy defines what action the agent should choose when it is in a given situation.