- AI, But Simple
- Posts
- Recurrent Neural Networks, Simplified
Recurrent Neural Networks, Simplified
AI, But Simple Issue #8
Just a small heads up, this week’s issue is going to require some past knowledge of neural networks, machine learning, and some calculus and linear algebra. If you have some past experience with them, you’re probably fine!
Recurrent Neural Networks, Simplified
AI, But Simple Issue #8
A recurrent neural network (RNN) is a deep learning model that is trained to take a sequential data input and transform it into a specific sequential data output.
Sequential data is any type of data where the order of the data is important. These can be words, sentences, or time-series data.
The key feature of RNNs is their ability to maintain a hidden state that captures information from previous time steps, effectively creating a form of memory.
They take information from prior inputs to influence the current input and output.
This hidden state is updated at each time step as the network processes new input data, enabling the network to learn from both the current input and the historical context.
Some common applications of the RNN include language translation, natural language processing (NLP), image captioning, and speech recognition.
Above, the hidden state is visualized by the loops
So why are RNNs used specifically for sequential data? Since they have this hidden state and a type of memory, it helps them process sequences of data where the order and context matter (text, time series data, etc.) better than other neural network architectures.
Another distinguishing characteristic of recurrent networks is that they share weights across each layer of the network.
In a standard feedforward neural network, each neuron in each layer has its own unique weight matrix for its connections to the previous layer. This means that every node independently learns its weights based on the inputs it receives.
In contrast, in an RNN, each node in a layer uses the same weight matrices for its connections across different time steps.
This means that as the RNN processes each element in a sequence, it applies the same set of weights repeatedly.
This weight sharing allows the network to capture patterns within the sequence efficiently, which is another reason to use this network for sequential data.
The weights are still adjusted during training through backpropagation and gradient descent, with slight modifications.
Architecture
Let’s investigate a very simple RNN architecture. This RNN will have an input layer, a hidden layer, and finally an output layer.
For the input layer, a vector xt is taken as the input to the network at time step t.
The vector depends on your use case. For instance, in natural language processing, each data point could be a word represented by a vector.
In the context of an RNN, a time step (t) refers to a single point in a sequence. For example, if you are processing a sentence word by word, each word represents a time step.
The hidden layer maintains a hidden state ht that is updated at each time step. This hidden state captures information from previous time steps.
The hidden state ht at time step t is computed based on the current input xt and the previous hidden state ht-1
This ht-1 is how the hidden state gains its “memory”
ht = f(Wxh*xt + Whh*ht-1), where Wxh is the weight matrix of the input to hidden state, and Whh is the weight matrix of the hidden to hidden state. The function f is taken to be a non-linear transformation such as tanh or ReLU.
These matrices are multiplied with either x(t) or h(t-1). There can also be a bias b, which is not displayed
For the weights, the RNN has input to hidden connections parameterized by a weight matrix Wxh, hidden to hidden recurrent connections parameterized by a weight matrix Whh, and hidden to output connections parameterized by a weight matrix Who and all these weights are shared across time.
These weights are initialized at random and serve as an important aspect of the inner node architecture.
These input to hidden, hidden to hidden, etc, all simply describe the connection of the neuron: input to hidden describes the connection into the neuron, hidden to hidden describes the loop back into itself, and hidden to output is the outgoing connection.
Visually, the loop connection at each neuron is represented by the hidden to hidden weight matrix multiplied by ht-1, where the hidden state from the previous time step is fed back into the neuron to influence the current hidden state.