Relu

Relu

ReLU (Rectified Linear Unit) is an regression activation function commonly used in neural networks to add non linearity properties in our network.


Article content

It addresses some of the issues associated with sigmoid and tanh activation functions.The problem associated with sigmoid was, it was not zero centric, due to which it fluctuates during back propagation, and takes long time in convergence which could consume high resources.


Article content

Another problem was because of derivative of sigmoid which ranges from (0, 0.25).During back-propagation the updated weight will be quite similar to new weight eventually leading to vanishing gradient problem.


Article content

Tanh is zero centric but face same problem as sigmoid, vanishing gradient.


Article content

Relu is zero centric (0,y). It’s derivative lies between (0,1) because any negative values are changed to 0 and non negative value to 1, solving the problem of vanishing gradient.


Article content

But relu comes with the problem of dead neuron. There could be scanario when Relu(z) = 0.

This happens, if the input to a Relu neuron is negative, leading to zero gradients during backpropagation, preventing weight updates.

To solve this issue we have varient of Relu called Leaky relu which has a small, non-zero slope for negative inputs.

if x > 0:

relu = x

else:

relu = alpha*x

alpha is a small positive constant (typically very close to zero, e.g., 0.01)





To view or add a comment, sign in

More articles by Kumar Dahal

Explore content categories