- AI, But Simple
- Posts
- Recurrent Neural Networks, Explained Mathematically
Recurrent Neural Networks, Explained Mathematically
AI, But Simple Issue #22
Hello from the AI, but simple team! If you enjoy our content, consider supporting us so we can keep doing what we do.
Our newsletter is no longer sustainable to run at no cost, so we’re relying on different measures to cover operational expenses. Thanks again for reading!
Recurrent Neural Networks, Explained Mathematically
AI, But Simple Issue #22
A Recurrent Neural Network (RNN) is a special type of neural network designed to handle sequential data.
Sequential data is any type of data where the order of the data is important. These can be words, sentences, or time-series data.
At its core, an RNN is trained to take sequential data as an input, and using its hidden state (which acts like its working memory), it transforms the data into a specific output of sequential data.
Unlike feedforward neural networks, RNNs have connections that form cycles, allowing information to be remembered over time.
We’ve gone over RNNs extensively in previous issues; you can read them to learn the theory behind RNNs:
This week, we’re going into the mathematics of simple RNNs, so a base understanding of the concept of an RNN needs to be learned.
Also, for new subscribers or beginners, knowing a bit of context on the mathematical processes of neural networks would be good as well, along with a bit of basic linear algebra (matrix multiplication, vectors, etc.) and calculus.
You can find a math explanation for a simple Multi-Layer Perceptron (MLP), one of the simplest forms of ANNs, here.
Before jumping into the math, we’re proud to say that we’ve partnered with Bullseye Trades for this sponsored segment!
Free Daily Trade Alerts: Expert Insights at Your Fingertips
Master the market in 5 minutes per day
Hot stock alerts sent directly to your phone
150,000+ active subscribers and growing fast!
Mathematical Formulation
This week, we’re going over a standard RNN for text generation. These types of RNNs will be fed data and be asked to produce the next words to be able to generate text.
Let’s start with an example using a small phrase: “Hello World”.
We’ll use the below vocabulary for the model. This means that the model only has access to these words in training. We can call these individual words tokens.
<START>
Hello
World
<END>
Here, you might be wondering what the words 1 and 4 are. RNNs typically use start and end tokens to denote where a sentence starts and where a sentence ends.
Each word is represented as a one-hot vector:
To train a RNN, we need to feed it a data point: an input sequence. Using the phrase “Hello World” works a little differently using start and end tokens.
We’ll use the start token in the input sequence and the end token in the target sequence to allow the model to learn what the first word of a sequence typically is, but also where a sentence should end off.
Our input and target sequences look like this:
Input Sequence:
Time Step 1: "<START>"
Time Step 2: "Hello"
Time Step 3: "World"
Target Sequence:
Time Step 1: "Hello"
Time Step 2: "World"
Time Step 3: "<END>"
Let’s talk about target values (values in the target sequence). The target values that the RNN uses act just like labels in other networks. In RNNs, if we’re training a text generator, the target value that the RNN will use will simply be the next token or word in the sentence.
For instance, in our example, if our token is the “<Start>” token, we would set the target value to be “Hello”.
This structure of target values and constantly feeding RNNs this type of data, including predicting how to start a sentence (what comes after the start token) and how to end a sentence (where should the end token be placed), is the way they are able to generate sentences word by word.
Model Architecture
Let’s quickly glance over our model architecture:
The weight matrices and biases are included in the diagram; read along further to get an explanation.
Input Size (Matches Vocabulary Size) (N): 4
Hidden Size (H): 2
Output Size (Matches Vocabulary Size) (C): 4
Our vocabulary size is 4, so we have 4 input neurons, as the input vector has a size of 4.