MLops : Session - 18

Gaurav Gupta

Published May 3, 2020

In today's article, I'm going to discuss Multilayer Perceptron. In my previous article, I already discuss Single Perceptron. You can check here.

Let's first see some more terminology which we used when we train our model.

Activation Function:

Activation function is one of the building blocks on Neural Network. You can think this as a formula which we give to neuron to learn and train the model. If we compare with our mind which is a combination of multi neurons, the activation function is at the end deciding what is to be fired to the next neuron.

We have some point by which we try to get a better understanding of activation function:

It takes in the output signal from the previous cell and converts it into some form that can be taken as input to the next cell.
As we see our linear function which we used is y = w*x +b is always given some value, if not restricted to a certain limit can go very high in magnitude especially in case of very deep neural networks that have millions of parameters. In classification models, we mostly have to need to find output in a range (0 or 1). So we need some activation function for doing this task such as Sigmoid function.
We have many types of activation functions that we use on behalf of the output we want like sigmoid function, Relu function, Step function, tanh function and many more. You can get detail of each here.

Parameter and hyperparameter :

Basically Parameter are the ones that the Models used for prediction like weight and bias etc. Hyperparameter are the ones which we give to the machine for the learning process or we can say those parameters which we give initially to machine such as learning rate, the number and size of the hidden layers, etc.

Binary Classification:

In the last article of deep learning, we learn to do train a linear regression model. Here we take an example of binary classification...

1. Load Dataset

This a dataset of bank details, we predict that someone left the bank or not, by give their detail.

2. We have some categorical variables so first remove this...

So finally we get a dependent and independent variable.

First we import sequential model. We use the same model for regression too.

4. Dense() function use to create hidden layers. Let's see...

I create 4 hidden layers. Units is the number of neuron in one layer.

In first three, We use Relu activation function. This function gives output as only positive side.

In 4th layer, we use sigmoid function because this is a problem of binary classification in which we need output as 0 or 1, Sigmoid function give output between 0 and 1.

4. Now the last part is to train our model.

I use Adam optimizer and 0.000001 learning rate. We did 200 epochs here and found that my loss is constantly decreases.

The number of hidden layers, learning rate, number of neuron, etc you give by check the loss again and again. There is no fixed way to get these terms.

We have two type of loss function in binary classification:

Binary CrossEntropy
Categorical CrossEntropy.

We talk about both in detail in our next sessions. I prefer to go once last some article to understand many terminologies which I not describe here like forward and backward propagation, learning rate, concept of epoch, etc.

I prefer you to go some last articles for better understanding.

Hope we meet again...

Happy Learning :)

To view or add a comment, sign in

MLops : Session - 18

Gaurav Gupta

Activation Function:

Parameter and hyperparameter :

Binary Classification:

1. Load Dataset

More articles by Gaurav Gupta

Others also viewed

A quick introduction to CNN layers

A Comparison of DNN, CNN and LSTM using TF/Keras

How to choose an activation function?

Understanding Image Segmentation – The What, Why and How: Part 4 of my series of blogs on Computer Vision

HOW A NEURAL NETWORK ACTUALLY MAKES A DECISION (WITH REAL NUMBERS)

Decoding the Brain of Machines: Unveiling the Marvel of Neurons in Artificial Neural Networks

Thy Gradient Descent – What Is It, When and How Shalt Thou Use It?

A Look at RNNs, LSTM, GRU, and Attention Mechanism!

Stochastic Gradient Descent — Clearly Explained !!

Explore content categories

Activation Function:

Parameter and hyperparameter :

Binary Classification:

1. Load Dataset

More articles by Gaurav Gupta

An Insightful Session with Two Amazing Technical Experts

Case Study on Gaming Industries working with AWS

Automate Docker With Ansible for launching WebServer

Launch VPC Wizard with Public and Private Subnet for WordPress with Bostion Host on AWS using Terraform

Fully secured and automated setup of WordPress using AWS and Terraform

Lauch Web-Server in single click using AWS-EFS and Terraform - Fully Automated

Amazon Elastic Kubernetes Service with HELM and EFS

Journey of DevOps Assembly Line Training with Vimal Sir, Preeti Mam and LinuxWorld

Amazing Journey of MLops Training with Vimal Sir, Preeti Ma'am, and LinuxWorld Pvt. Ltd.

Deploying web-server using Jenkins-Groovy-code and Kubernetes

Others also viewed

A quick introduction to CNN layers

A Comparison of DNN, CNN and LSTM using TF/Keras

How to choose an activation function?

Understanding Image Segmentation – The What, Why and How: Part 4 of my series of blogs on Computer Vision

HOW A NEURAL NETWORK ACTUALLY MAKES A DECISION (WITH REAL NUMBERS)

Decoding the Brain of Machines: Unveiling the Marvel of Neurons in Artificial Neural Networks

Thy Gradient Descent – What Is It, When and How Shalt Thou Use It?

A Look at RNNs, LSTM, GRU, and Attention Mechanism!

Stochastic Gradient Descent — Clearly Explained !!

Similar topics

Linear Regression Models

Explore content categories