- AI, But Simple
- Posts
- Multilayer Perceptrons: The Math Behind Deep Learning
Multilayer Perceptrons: The Math Behind Deep Learning
AI, But Simple Issue #17
Deep Learning Math: Multilayer Perceptrons
AI, But Simple Issue #17
In the field of deep learning, we solve machine learning problems while using Artificial Neural Networks (ANNs).
One of the most common first learning steps in deep learning is to learn the Multilayer Perceptron (MLP), which is a specific type of Artificial Neural Network (ANN).
An MLP is a type of feedforward artificial neural network, composed of at least three layers: an input layer, one or more hidden layers, and an output layer.
If you are to say to someone that you understand deep learning, it is essential to understand how MLPs work. They are the most common and basic when it comes to technicalities.
In our previous issues, we’ve covered a lot about propagation, nodes, activation functions, and gradients of MLPs.
In this issue, we’re going to sum up the process and go into the math of it.
Before continuing, it is highly recommended to have some basic knowledge of linear algebra and calculus.
Mathematics of MLPs
From a mathematical perspective, MLPs are just a set of functions where each layer applies transformations to an input to produce a corresponding output.
Starting with the structure, each MLP has at least an input layer, one or more hidden layers, and an output layer.
If you have a neural network with n layers, counting both the input and output as a layer, you will have n-1 weight matrices.
Biases work the same way; if you have n layers, you will have n-1 bias vectors.
For instance, if you have 3 layers (one input, one hidden, and one output), you will have one weight matrix connecting the input to the hidden layer, and another connecting the hidden to the output layer.
The dimensions of the weight matrices are determined by the number of neurons in the incoming layer along with the number of neurons in the outgoing layer.
Let’s take the weight matrix connecting the input to the hidden layer. If the input layer has 20 neurons, and the hidden layer has 40 neurons, then the weight matrix will have a size of 40 × 20 (rows x columns).
Here, the superscript is acting as an index for the weight matrix. Since this goes from input to hidden, it is the first. Then, the second would be the hidden to the next layer.
The subscript denotes the column and row for easy indexing. In later examples, we will remove the comma.
However, in MLPs, biases work a little differently as they only have one column. An input layer will not have a bias vector, they only start from the first hidden layer.
It will take the dimension of the number of nodes of the hidden layer x 1, and given a hidden layer with 5 nodes, it will look like this:
Full Example
We’ll do a full example with the Iris dataset now.
The Iris dataset is a classic dataset used in machine learning for classification tasks. It contains 150 samples of iris flowers from three species:
Iris Setosa
Iris Versicolor
Iris Virginica
Each sample has four features:
Sepal Length (cm)
Sepal Width (cm)
Petal Length (cm)
Petal Width (cm)
So the input matrix has a size of 150 × 4, and we will denote it as x.
The MLP we will be using will consist of one input layer, one hidden layer, and one output layer.
The input layer will have 4 neurons, allocating one neuron per feature (column), the hidden layer will have 6 neurons from arbitrary choice, and the output layer will have 3 neurons—one for each class.
The network looks like this:
The labels on the bottom correspond to the weights and biases the network has.
The weight matrices and biases look like this:
Here, the comma in the subscript is removed, so 11 represents 1, 1.
Forward Pass
Let’s go through a simple process of forward propagation. Suppose we have an iris flower with the following features:
Sepal Length (x1): 5.1 cm
Sepal Width (x2): 3.5 cm
Petal Length (x3): 1.4 cm
Petal Width (x4): 0.2 cm
This would represent one row in the 150 examples.