$$ \newcommand{\diag}[1]{diag({#1})} \newcommand{\reshape}[2]{reshape \left({#1},\ {#2}\right)} \newcommand{\bm}[1]{\textbf{#1}} \newcommand{\der}[2]{\frac{\partial #1}{\partial #2}} $$

1  Introduction

Recurrent Neural Networks (RNNs) are the type of neural networks used for mapping sequential data. It has same architecture of regular multi-layer perceptrons (MLPs), except that it will take additional components in its input along with data.

RNNs require the sequential data to be split into sub-parts and then it will process each sub-part. The additional component that will be concatenated to the input vector of current sub-part is the hidden layer output vector that was obtained for the previous sub-part. By this way, the information about data sequence is learned by the network.

Hence, the back propagation equations will be different from the regular MLPs for these RNNs as they have to handle the information sequence for updating weights. Thus, it is called back propagation through time (BPTT).

For this derivation of BPTT, a simple RNN that as input vector size \(n=2\), output vector size \(N^{(2)} = 2\) and hidden neurons \(N^{(1)} = 3\) was taken as shown in Figure 1.1 .

Figure 1.1: Recurrent Neural Network for BPTT derivation