- AI, But Simple
- Posts
- Hyperparameter Tuning for Deep Learning
Hyperparameter Tuning for Deep Learning
AI, But Simple Issue #6
Hyperparameter Tuning In Deep Learning
AI, But Simple Issue #6
Hyperparameter Tuning is the act of searching for the best values of hyperparameters to minimize some loss function.
If you’ve ever explored any Kaggle competitions, or just want a method to optimize your network further, this is one of the most popular methods.
Hyperparameters are user-defined values, while parameters are altered during training
For instance, a hyperparameter may be the learning rate, while a parameter is the weight or bias.
If you don’t know what most of these words mean, please check out a previous issue, where we delved into the process of learning.
There are many different approaches to searching for the best configuration of hyperparameters, and some of the most known ones are:
Grid Search
Random Search
Bayesian optimization
Popular hyperparameter tuning methods
There are also 2 main categories of these algorithms: Black-Box and Multi-Fidelity.
Black-box hyperparameter tuning treats the model as a black box, meaning it doesn’t factor any internal details of the model into account. It only considers the inputs (hyperparameters) and the outputs (performance metrics).
It’s easy to implement since it doesn’t require knowledge of the model's internal structure.
But it’s costly: there are high computational costs, especially with complex models.
Multi-fidelity hyperparameter tuning uses different levels of fidelity (or accuracy) to optimize hyperparameters.
Multi-fidelity hyperparameter optimization allocates more resources to promising configurations and stops evaluations of poorly performing ones early.
It reduces the overall cost of hyperparameter tuning by using cheaper, approximate evaluations to guide the search.
In this issue, we will only go into black-box hyperparameter tuning methods, as they are more simple and would be best to learn first.
Grid Search
In grid search, we try every possible configuration of hyperparameters, and take the configuration with the lowest loss (hyperparameter as input, loss as output, known as an objective function).
So when we define values of certain parameters in a tabular format (where each row is a hyperparameter and there are multiple columns for values of each hyperparameter), it will look at every single possible configuration of them:
parameters ={
'neurons': (10, 100),
'activation':(0, 9),
'optimizer':(0,7),
'learning_rate':(0.01, 1),
'batch_size':(200, 1000),
'epochs':(20, 100)
}
This is how it searches for the optimal “set” of hyperparameters.
Common hyperparameters include “batch_size”, “learning_rate”, etc.
If you don’t know what these are, read this resource on optimizers
This is time consuming and resource-intensive, and the completion time explodes when there are many hyperparameter options.