Ingredients Of Machine Learning

Ingredients Of Machine Learning


Every recipe consists of a set of ingredients that makes it unique, these ingredients are the reason the dish tastes such.

Similarly for a proficient Machine Learning model, we would require a certain set of ingredient which will in turn confirm the success of that model

In this article we will take a look at the six ingredients ( represented as jars ) that constitute our machine learning model.

We will be filling up the labels on these jars along the length of this article

DATA

As obvious as it seems,data plays a profound role in any machine learning model,and in this day and age different variations and types of data is readily available.

Some examples are

  • Structured data (from tables)
  • Unstructured data ( from websites like amazon, raw product reviews )
  • video data ( from websites like Facebook)
  • audio data etc.

Now at this point we need to understand that even though so many sort of data is available, for machine learning we require a specific type of data.

let us understand more about the kind of data we require with the help of an example of an application

now here in this application, based on the medical image provided, we want to find out if there is any medical anomaly .

Now we notice that the data here has two parts

  1. Numerically encoded Input of the image ( pixel value for the medical image represented as "X")
  2. Output declaring if there is any medical anomaly (Y=1) or not (Y=0)

For the data to be useful for our machine learning model ( which will in then be trained on the data), we require an output for the corresponding input( in case of supervised learning).

Now if at any point of time we require the application to tell us not only about the existence of a medical anomaly but also the location where the anomaly is present, we would require the our training data to also include locations of the anomaly .

This indicates a relation between the kind of output we require and the particular type of data we would needed for our machine learning model.

Now the data can be of any form, for sentiment analysis, input could be comments which would need to be converted to numerical quantities (this is where, NLP comes in) and the output a single 1 or 0 for a positive or negative comment.

TASK


Now that we understand and have attained the appropriate data for our machine learning model, lets understand about our second ingredient "task". What we want to do with our data defines the purpose of our model.

Let's consider a product selling website like amazon with the following available data which can be used as input

  1. Visual data ( in form of product images)
  2. Structured data ( in form of tabular product description )
  3. Unstructured data ( in form user comments, or product description provide by vendor )

In a situation like this, when we have an abundance of data at our disposal, it becomes crucial to recognize the kind of task we want to be perform.

Having understood this, let's try to identify the tasks we can perform in our aforementioned example

  1. With the help of unstructured product description as our input, we can formulate the tabular product description as our output
  2. With the help of user reviews and tabular product description as our input, we can create FAQs as our output
  3. With the help of user user reviews, tabular product description and FAQs our input, we can answer customer questions as our output

Now that we are clear on the ability of the tasks we can perform, lets dive deeper and understand about the different classes of tasks

TYPES OF TASK

Supervised

Our first set of task are called supervised set of tasks, where a certain response ( output ) is always associated with the input, like in our medical anomaly example, 1 as a response was associated with images which depicted an anomaly. Our machine uses the set of input and output to train itself.

Under supervised learning we can perform two types of task, i.e classification and regression

  • Classification

In Classification we try to identify if the test input belongs to a certain class, for example we can take a set of images (in form of rgb pixel value) and classify them as to whether it contains any sort of text or not



  • Regression

In Regression we try to obtain real values as output for the test input, provided the machine has learned form a dataset which had numerical output corresponding to each input. In our example, her we trying to locate the coordinate where we first encounter text data

Unsupervised

Under the unsupervised set of tasks, we do not have labeled responses ( output ) corresponding to out input. Unsupervised learning comprise of the following tasks

  • Clustering

As the name suggests, in clustering, we can cluster the unlabeled input into sets of clusters containing images depicting similar behavior



  • Generation

 Here we try to generate a similar element as the given input. e.g., below a bot is looking at some tweets as input data and generating a new tweet that is at per with the input.

MODEL


Now that we have identified out data and tasks to perform lets talk about our third ingredient "model"

Our data had some values in "x" as input with corresponding labels as output

Now it is safe to concur that there is some mathematical relationship between out input and its corresponding labelled response. So, there is some function y =f (x), which maps the input to the corresponding output. In practical scenarios though we don't know what that function is,so we in turn after looking at the data, devise an approximate relation.

Let's understand this in a more practical detail

Assume we have the points of the dataset plotted, now our aim is to device a function that best or approximately describes the relation between y and x values.

Initially lets assume, that the relationship between x and y values is linear

With the data provided, we will try to learn thee values of m and c, which would then lead to our conclusion that no matter what line we form, no line can pass through all these data-points

Next,we try a quadratic function, and try to learn the values of a,b and c, but here as well now matter what the values, our curve cannot pass through most of the points. We conclude that our function is still not complex enough to capture the true relationship

Similarly we can continue this process until we reach a degree 25 polynomial, which does not completely, but approximately capture the relationship between x and y


Now these function, that we tested are known as models, which as the name suggests is trying to model the relationship between y an x.

An example of such function, the Neural Network family of functions are depicted in the pink box.

LOSS FUNCTION

From the model section, we can concur that we can test an array of functions as our model, this raises the question as to how would we rank these function as better or worse?


In the above image, we have our input and output y. Also, say there are 3 people who have proposed three different polynomials as models. Now our aim is to find the model best suited to the true relation between x and y. Now how do we do that?

This is where our fourth ingredient Loss function comes in

The loss function helps us to determine the model closest to the true relation between input and the output.

Now if we calculate the loss for the above three proposed models they will look something like this.

Now it is evident that the first proposed model has the least error (L1) and hence can be declared as the best-proposed model among the three.

Few examples of the loss function

  • Squared Error (Described above)
  • Cross Entropy
  • Kullback–Leibler (KL) Divergence

LEARNING ALGORITHM


Now let’s say we have an n-th degree polynomial as the model and we have our set of x and y. Now we have another hurdle to cross. That is to find the parameters i.e. the coefficients of x. We can use the brute force method where we can fix (n-1) coefficients and vary the last coefficient to check for the value where the loss is minimum. We can repeat this process for every coefficient.

But in the real-world scenario, this method is absurd. Like “a man in an iron suit” absurd.

So our goal is to find an efficient way to compute these coefficients (a, b, c etc.) given the dataset (x and y), given the model and given the loss function (L) such that the L is minimized. So this can be labeled as an optimization problem with optimization solvers.

Some Types of Learning Algorithm / Optimization Solvers

  • Gradient Descent and its Variants
  • Adagrad
  • RMSprop
  • Adam
  • Backpropagation (Used for training DNN)
  • Backpropagation Through Time (BPTT: Used for training RNN)

EVALUATION

Our last but not the least ingredient is Evaluation

Every program or build needs to be evaluated before taking its first step to the world. It can be viewed as a scoring system based on certain tests. The score is the value of how well the program performs in a real-world scenario.You should always evaluate a model to determine if it will do a good job of predicting the target on new and future data

calculating the accuracy of the model is what determines how proficient the model is





There are certain tools that can help us in achieving this. They are called evaluation matrices.

Types of Evaluation Matrix

  • (Top-k) Accuracy
  • Precision, Recall and F1 Score

Ending note

So here are the 6 jars representation of machine learning. This is a very unique way to look at machine learning through the concept of jars. If we tie them together, they can be summarized as follows.

  • There are Data (in many forms).
  • And there are some Tasks that can be performed on them.
  • Based on a set of Models that can be proposed.
  • And a machine Learns the Algorithm and based on that, it tries to determine the parameters of Models.
  • And tries to determine the best Model that provides the closest solution to the actual one with the help of a Loss Function.
  • Such that the Loss is minimum
  • And then it gets Evaluated with the help of evaluation matrices.

THIS ARTICLE COULDN'T HAVE BEEN POSSIBLE WITHOUT PADHAI

thank you for reading this article


To view or add a comment, sign in

More articles by Jaswinder Singh

  • The Perceptron model

    In this article we are going to construct a perceptron model in python and train it on a mobile like/dislike dataset…

  • Speed Comparison between Python data Types

    This article resolves around the comparison of creation and membership testing speed between List, Tuples and Sets. The…

  • The ADALINE model

    This article is a short introduction to the ADALINE (Adaptive Linear Element ) model and its training algorithm…

    1 Comment
  • Jargon Busting

    This article is a short summary of the video by One Fourth Labs With all buzz going around in the field of AI , "AI in…

Others also viewed

Explore content categories