Have you used Gin Config? If not, I encourage you to try it! It allows you to configure your Python ML experiments in a simple and intuitive way.
Suppose you have an ML experiment like:
# 01_gin/trainer.py
import gin
@gin.configurable
class Transformer:
def __init__(self, d_model, d_ff, n_layers, dropout):
# ...
# Model code...
@gin.configurable
def train(model_class, learning_rate, n_steps):
model = model_class()
# Training code...
# ...
Then you can create a configuration file like this:
# 01_gin/my_transformer.gin
Transformer.d_model = 1024
Transformer.d_ff = 4096
Transformer.n_layers = 6
Transformer.dropout = 0.1
train.model_class = @Transformer
train.learning_rate = 0.001
train.n_steps = 100000
Feed the config to Gin:
# 01_gin/trainer.py
# ...
if __name__ == '__main__':
gin.parse_config_file('my_transformer.gin')
train()
... and it will forward all your hyperparameters to the appropriate functions automagically!
However, one thing that Gin cannot do is define hyperparameter sweeps. If you want to run a search over different hyperparameter values, you need to write your own loops and override the Gin configs manually, which can lead to messy and hard-to-debug code.
That's where Hyperion comes in! This tool allows you to define hyperparameter sweeps using a syntax similar to Gin. It also introduces new features for writing configs, including expressions.
To install, run:
pip install https://github.com/pkozakowski/hyperion/releases/download/v0.1.0/hyperion-0.1.0-py3-none-any.whl
Let's say you want to tune the dimensionality of the above model. Hyperion makes it easy:
# 02_one_param_sweep/sweep.hyp
include 'my_transformer.gin' # Load the base config.
Transformer.d_model: [512, 1024, 2048]
Then load the sweep file in Hyperion:
# 02_one_param_sweep/run_sweep.py
import gin
import hyperion
import trainer
for config in hyperion.parse_sweep_file('sweep.hyp'):
gin.parse_config(config)
trainer.train()
... and that's it! Hyperion will generate config files you can pass to Gin.
In a later section we'll see how to run experiments in different processes.
The examples shown here are also available in the examples/
directory.
Now let's say we want to tune both d_model
and n_layers
in a grid search:
# 03_grid_search/sweep.hyp
include 'my_transformer.gin'
Transformer.d_model: [512, 1024, 2048]
Transformer.n_layers: [4, 6, 8, 10]
Hyperion will generate all combinations of the two hyperparameters and output the Gin configs, which you can load the same way as before.
But that's not all - Hyperion can handle much more complex setups.
Grid searches are great, but their sizes can blow up quite quickly. Also, some hyperparameters may depend on one another. Sometimes instead of making one big grid search it's better to run a couple of smaller grids. That's where the union
and product
blocks come in.
Say that you want to also tune d_ff
, but you don't want to make your model too big. You can split your grid into parts with different hyperparameter ranges for each value of d_ff
:
# 04_union_product/union_of_products.hyp
include 'my_transformer.gin'
union:
product:
Transformer.d_ff = 2048
Transformer.d_model: [512, 1024, 2048]
Transformer.n_layers: [4, 6, 8, 10]
product:
Transformer.d_ff = 4096
Transformer.d_model: [512, 1024, 2048]
Transformer.n_layers: [4, 6, 8] # Limit the number of layers.
product:
Transformer.d_ff = 8192
Transformer.d_model: [512, 1024] # Limit d_model too.
Transformer.n_layers: [4, 6, 8]
Hyperion defines a sweep as a set union
block computes a union over sets: product
block computes a Cartesian product:
The sweep resides in an implicit product
block, so sweeps
# 04_union_product/implicit_product.hyp
include 'my_transformer.gin'
Transformer.d_model: [512, 1024, 2048]
Transformer.n_layers: [4, 6, 8, 10]
and
# 04_union_product/explicit_product.hyp
include 'my_transformer.gin'
product:
Transformer.d_model: [512, 1024, 2048]
Transformer.n_layers: [4, 6, 8, 10]
are equivalent.
union
and product
blocks are composable, so you can do things like:
# 04_union_product/composability.hyp
include 'my_transformer.gin'
product:
Transformer.n_layers: [4, 6, 8, 10]
union:
Transformer.d_model: [512, 1024, 2048]
Transformer.d_ff: [2048, 4096, 8192]
What if you just want to list the hyperparameter settings that should be run? Tables let you write them very succintly:
# 05_table/table.hyp
include 'my_transformer.gin'
table (Transformer.d_model, Transformer.d_ff, Transformer.n_layers):
512, 2048, 4
1024, 4096, 6
2048, 8192, 8
And they compose with product
and union
, so for instance
# 05_table/composability.hyp
include 'my_transformer.gin'
product:
table (Transformer.d_model, Transformer.d_ff, Transformer.n_layers):
512, 2048, 4
1024, 4096, 6
2048, 8192, 8
Transformer.n_heads: [2, 4, 8]
will run the 3 numbers of heads for every row of the table.
Hyperion implements a reasonable subset of the Python expression language. This is ueful for instance when you want to compute one hyperparameter based on another one.
Suppose you want to tune d_model
and set d_ff
to be 4 times that value. Normally you'd have to list those settings manually in a table, but with expressions and Gin macros you can just:
# examples/06_expressions/one_param_sweep.hyp
include 'my_transformer.gin'
d_model: [512, 1024, 2048]
Transformer.d_model = %d_model
Transformer.d_ff = %d_model * 4
Of course you can tune the multiplier too:
# examples/06_expressions/two_param_sweep.hyp
include 'my_transformer.gin'
d_model: [512, 1024, 2048]
d_ff_mul: [1, 2, 4]
Transformer.d_model = %d_model
Transformer.d_ff = %d_model * %d_ff_mul
Gin supports calling Python functions from the configs:
# 07_function_calls/trainer.py
import math
import gin
# ...
@gin.configurable
def compute_learning_rate(base, batch_size):
# Learning rate scaling according to https://arxiv.org/pdf/1404.5997.pdf.
return base * math.sqrt(batch_size)
@gin.configurable
def train(model_class, learning_rate, n_steps):
model = model_class()
# Training code...
# ...
But you have to supply the arguments of the call as separate bindings:
# 07_function_calls/config.gin
include 'my_transformer.gin'
compute_learning_rate.base = 0.001
compute_learning_rate.batch_size = 64
train.learning_rate = @compute_learning_rate()
This looks a bit weird and can become cumbersome when you have multiple calls to the same function.
Hyperion allows you to supply the arguments in the call itself:
# 07_function_calls/sweep.hyp
include 'my_transformer.gin'
batch_size: [64, 128, 256]
train.learning_rate = @compute_learning_rate(base=0.001, batch_size=%batch_size)
You might have noticed that the sweeps we've written so far were a bit redundant. For instance, in
# 08_with/flat.hyp
include 'my_transformer.gin'
Transformer.d_model: [512, 1024, 2048]
Transformer.n_layers: [4, 6, 8, 10]
the Transformer.
part occurs in multiple lines. You can shorten this using with
blocks:
# 08_with/nested.hyp
include 'my_transformer.gin'
with Transformer:
d_model: [512, 1024, 2048]
n_layers: [4, 6, 8, 10]
They compose with the other block types, for instance:
# 08_with/composability.hyp
include 'my_transformer.gin'
with Transformer:
union:
table (d_model, d_ff, n_layers):
512, 2048, 4
1024, 4096, 6
2048, 8192, 8
n_heads: [2, 4, 8]
In practical scenarios, you'll probably want to run your experiments in different processes, possibly on different machines, in a cluster job queue or in a cloud. You can use the Hyperion CLI to generate configs and save them to files, and then forward them to the training processes.
You'll need a training script which can receive a config file as an argument:
# 09_multi_process/trainer.py
import sys
# ...
if __name__ == '__main__':
config_path = sys.argv[1]
gin.parse_config_file(config_path)
train()
Generate configs using the hyperion
command. The first argument is the sweep file, the second is a directory to save the configs at. For a sweep filename sweep.hyp
they will be named sweep_*.gin
, where *
are consecutive numbers starting from 0.
Then lanuch the experiments.
# 09_multi_process/run_sweep.sh
hyperion sweep.hyp configs/
for config in configs/sweep_*.gin; do
python trainer.py $config
done
Hyperion aims to be a superset of the Gin configuration language, so that any Gin config is a valid Hyperion sweep containing one hyperparameter set. Conversely, Hyperion generates valid Gin configs. All of the code is thoroughly tested using the awesome package hypothesis
to ensure this.
Some of the features of Hyperion (expressions, function calls and with
blocks) are useful for writing configs themselves. To use them in isolation from sweeps, just call the parse_config_*
functions from the hyperion
module instead of gin
:
# 10_config/trainer.py
import hyperion
# ...
if __name__ == '__main__':
hyperion.parse_config_file('config.gin')
train()
Then you can write configs like:
# 10_config/config.gin
d_model = 2048
d_ff_mul = 4
batch_size = 64
with Transformer:
d_model = %d_model
d_ff = %d_model * %d_ff_mul
n_heads = 8
n_layers = 6
with train:
model_class = @Transformer
learning_rate = @compute_learning_rate(base=0.001, batch_size=%batch_size)
n_steps = 100000
Note that the @configurable
decorators should still be imported from the gin
module.
If you're using Hyperion in your research, please consider citing this repo:
@misc{Kozakowski2022,
author = {Piotr Kozakowski},
title = {Hyperion: Configuration tool for ML hyperparameter sweeps},
year = {2022},
publisher = {GitHub},
journal = {GitHub repository},
howpublished = {\url{https://github.com/pkozakowski/hyperion}},
}