Parameter Count in RNN

Parameter Count in RNN

All the calculations are made based on the Keras Framework,but the idea is similar for any framework

I work as an intern in Statinfer, where I build machine learning models for different requirements. Recently I had my work started in RNN.

In all fairness, I have treated RNN like a black box for a while , and the reason is that ideas (such as unrolling) shown are quite abstract, they never tend to show which neuron is connected to where.


RNN are type of Neural Network where the previous outputs are fed back into the current calculation of the input along with the current time step of the input ,when we unroll as shown in above image,we take in the inputs for each time-step and also pass on the previous value t-1 output to the current computation of the t time step.

For a single neuron in a single cell,the parameters in the RNN cell will be 3(bias is also included ,if bias is not needed then use_bias=False in the SimpleRNN layer ).As you can see there are three parameters for a single cell,with a single neuron, Input Weight,Previous output Weight and Bias


A simple diagram I did ,so that I can visually keep track of what are the values added into the cell .In the diagram , the parameter

X is input , W is input weight, H is hidden layer output , U is previous output value's weight P is previous value output ,O is output of the given cell

Now if we increase to two input nodes for a single cell, The parameter count now will be 8

Here we have to notice the connections between the previous value and the current computation. The previous value is used in current computation of two input nodes .


In this example , in a single cell there are two computational nodes,which means the output of these are passed into the next cell state.Both the previous values are used for current computation.The diagram in the left is the connections between three nodes

A total of 8 parameters comes from 4 recurrent kernel(weights for previous value) 2 input weights and 2 bias.The diagram gives an idea of how actually the computation takes place in a cell.Notice that I have mentioned P1 and P2 ,here its the previous two outputs that is Ot-1 and Ot-2 (could have wrote O0 and O-1,but brings in a confusion)

The last example will be taking three cell states each having input node and calculate the number of parameters present in a cell state

In RNN parameter sharing happens,hidden weights are shared across different cell states,recurrent weight is previous value weight .The reason the parameter sharing is done,in order to reduce the values to be learned .For long sequences, it would probably be computationally infeasible and often lead to overfitting of the model.The parameter sharing won’t be a good choice when structurally very different on different time step

The number of parameters in this will be 1 recurrent kernel(parameter sharing) 3 input weights and one bias(the bias are also shared between the cell states in Keras)

With this , I was able to understand how the RNN connections are made.A simple curiosity helped me to give a proper understand in RNN. Always for any model,to understand how the implementation works,it is a good exercise to calculate the number of parameters on your own.(Big Thanks to Mohit Kumar who helped in formating and also creating images)

Thank You

To view or add a comment, sign in

Others also viewed

Explore content categories