Generative AI Basic Session: 2
Hello All,
Welcome back to my latest blog on generative AI. In case you missed my initial blog covering the fundamental concepts of Gen AI, I recommend checking it out through the following link.
Before deep dive on GenAI let’s spend some time and understand Deep Learning. As you know GenAI is the subfield of Deep learning. Let spent some time on Deep Learning.
What is Deep Learning? Deep learning is like a super-smart technique in artificial intelligence (AI) that helps computers process information kind of like our brains do. It's all about using something called artificial neural networks, which are like computerized versions of the networks in our brains. These computer brain networks are really good at spotting tricky patterns in things like pictures, text, sounds, and other info. They're so smart that they can give us accurate predictions and cool insights based on the data they've learned from. It's like teaching computers to be brainy and figure out complex stuff!
Since it is based on Artificial Neural network let’s understand that. Neural networks try to mimic the human brain and its learning process. Like a brain takes the input, processes it and generates some output, so does the neural network. It has 3 steps
1> Receiving inputs
2> Processing information
3> Generating output
The complete training process of neural network involves two steps.
1. Forward Propagation:
- Input Layer: The numerical values representing pixel intensities in an image are fed into the input layer. Each neuron in the input layer represents a feature or pixel.
- Hidden Layers: Neurons in the hidden layers perform mathematical operations on the input data. These operations involve weighted sums and activation functions. The weights and biases are the parameters that are learned during the training process.
- Output Layer: The final prediction is generated by the neurons in the output layer. This prediction is based on the processed information from the hidden layers.
Recommended by LinkedIn
2. Backward Propagation:
- Error Calculation: The output of the neural network is compared to the actual target values, and the error or loss is calculated. Common loss functions include mean squared error or cross-entropy.
- Backward Pass: The gradients of the loss with respect to the parameters (weights and biases) are calculated using the chain rule of calculus during the backward pass. This involves determining how much each parameter contributed to the error.
- Parameter Update: The parameters are updated using optimization algorithms (e.g., gradient descent) to minimize the loss. The learning rate is a hyperparameter that influences the size of the steps taken during this optimization process.
- Iterative Process: Steps 1-3 are repeated for multiple iterations (epochs) until the model converges to a state where the loss is minimized and predictions are accurate.
This iterative process of forward and backward propagation is fundamental to training neural networks, allowing them to learn from data and make predictions on new, unseen data. The network learns by adjusting its parameters based on the errors made during predictions, gradually improving its performance.
Below mathematical formula is an additional info to explain backpropagation.
Let's define the forward and backward propagation steps with mathematical formulas for a simple neural network with one hidden layer. For clarity, I'll use a specific notation:
Here m is the number of training examples, dot denotes element wise multipication, and sigmoid is the derivative of the sigmoid function. The learning rate is denoted by alpha.
In the next blog I will share more detail about CNNs, RNNs, LSTMs and issues with it and the motivation behind GenAI.
***Cross-entropy is favoured as a loss function for classification tasks because it provides a continuous and differentiable measure that encourages the model to assign higher probabilities to the correct classes. This is crucial for training neural networks using gradient-based optimization algorithms, such as stochastic gradient descent.***