- AI, But Simple
- Posts
- A Simple Guide To RNNs
A Simple Guide To RNNs
AI, But Simple Issue #43

Hello from the AI, but simple team! If you enjoy our content, consider supporting us so we can keep doing what we do.
Our newsletter is no longer sustainable to run at no cost, so we’re relying on different measures to cover operational expenses. Thanks again for reading!
A Simple Guide To RNNs
AI, But Simple Issue #43
A recurrent neural network (RNN) is a deep learning model that is trained to take a sequential data input and transform it into some sequential data output.
Sequential data is any type of data where the order of the data is important. These can be words, sentences, or time-series data.
RNNs are widely used in natural language processing (NLP) tasks, including language translation and speech recognition.

The key feature of RNNs is their ability to maintain a hidden state that captures information from previous time steps, effectively creating a form of memory. It is the middle portion of the network shown above (denoted ht).
This hidden state is updated at each time step (which we denote as t) as the network processes new input data, enabling the network to learn from both the current input and the historical context.
In terms of how RNNs work mathematically, they have two main equations to compute: the hidden state ht and the output yt.
For each timestep t, the hidden state ht is expressed as follows:

Where Wx represents the input to hidden weight matrix, Wh represents the hidden to hidden weight matrix, bh is the hidden bias, f1 is some activation function (usually tanh or sigmoid), and xt is the input.
For each timestep t, the output yt is:

Here, Wy represents the hidden to output weight matrix, by represents the output bias, and f2 is an activation function.
The weight matrices and biases included in both the hidden state and the output are shared temporally.
This means that the same set of weights is used to compute the hidden state at every time step, which is how RNNs can process sequences of arbitrary length while learning to recognize patterns over time.
Activation Functions
The most common activation functions used in RNNs are the sigmoid, tanh, and ReLU functions.

A surprising thing to note is that standard RNNs use the hyperbolic tangent/sigmoid function very often (even more than ReLU), as using other activations such as ReLU may create such large numbers in certain conditions that it can lead to overflow.