From 2abb6ebaf22ae64087fa4127014587dd1e4d82f8 Mon Sep 17 00:00:00 2001 From: Hiwot Kassa Date: Wed, 2 Oct 2024 20:55:46 -0700 Subject: [PATCH] fixed typo --- training_rules.adoc | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/training_rules.adoc b/training_rules.adoc index 5c6470e..39f58da 100644 --- a/training_rules.adoc +++ b/training_rules.adoc @@ -276,7 +276,7 @@ The MLPerf verifier scripts checks all hyperparameters except those with names m |=== |Model |Optimizer |Name |Constraint |Definition |Reference Code |Latest version available -|bert |lamb |global_batch_size |unconstrained |The glboal batch size for training. |--train_batch_size |v4.1 +|bert |lamb |global_batch_size |unconstrained |The global batch size for training. |--train_batch_size |v4.1 |bert |lamb |opt_base_learning_rate |unconstrained |The base learning rate. |--learning_rate |v4.1 |bert |lamb |opt_epsilon |unconstrained |adam epsilon |link:https://github.com/mlperf/training/blob/fb058e3849c25f6c718434e60906ea3b0cb0f67d/language_model/tensorflow/bert/optimization.py#L75[reference code] |v4.1 |bert |lamb |opt_learning_rate_training_steps |unconstrained |Step at which your reach the lowest learning late |link:https://github.com/mlperf/training/blob/master/language_model/tensorflow/bert/run_pretraining.py#L64[reference code] |v4.1 @@ -319,7 +319,7 @@ The MLPerf verifier scripts checks all hyperparameters except those with names m |llama2_70b_lora |adamw |opt_learning_rate_warmup_ratio | unconstrained |ratio of steps out of training for linear warmup during initial checkpoint generation. This only affects the learning rate curve in the benchmarking region. |See PR (From Habana, TODO Link) |v4.1 |llama2_70b_lora |adamw |opt_learning_rate_training_steps | unconstrained |Step when the end of cosine learning rate curve is reached. Learning rate cosine decay is in range (opt_learning_rate_warmup_steps + 1,opt_learning_rate_decay_steps]. |See PR (From Habana, TODO Link) |v4.1 |llama2_70b_lora |adamw |opt_base_learning_rate |unconstrained | base leraning rate |See PR (From Habana, TODO Link) |v4.1 - |stable diffusion |adamw |global_batch_size |unconstrained |The glboal batch size for training |link:https://github.com/mlcommons/training/blob/master/stable_diffusion/main.py#L633[reference code] |v4.1 + |stable diffusion |adamw |global_batch_size |unconstrained |The global batch size for training |link:https://github.com/mlcommons/training/blob/master/stable_diffusion/main.py#L633[reference code] |v4.1 |stable diffusion |adamw |opt_adamw_beta_1 |0.9 |coefficients used for computing running averages of gradient and its square |link:https://github.com/mlcommons/training/blob/master/stable_diffusion/ldm/models/diffusion/ddpm.py#L1629[reference code] |v4.1 |stable diffusion |adamw |opt_adamw_beta_2 |0.999 |coefficients used for computing running averages of gradient and its square |link:https://github.com/mlcommons/training/blob/master/stable_diffusion/ldm/models/diffusion/ddpm.py#L1630[reference code] |v4.1 |stable diffusion |adamw |opt_adamw_epsilon |1e-08 |term added to the denominator to improve numerical stability |link:https://github.com/mlcommons/training/blob/master/stable_diffusion/ldm/models/diffusion/ddpm.py#L1631[reference code] |v4.1 @@ -756,4 +756,4 @@ MLPerf recommends calculating _utilization_ as `model_tensor_flops / (peak_syste Use of `hardware_tensor_flops` (defined as model_tensor_flops plus operations added due to activation recomputation), instead of `model_tensor_flops` is strongly discouraged because those are not useful flops for the model. If `hardware_tensor_flops` are used for calculating utilization, it is recommended to also provide an accompanying calculation with `model_tensor_flops`. -Note _utilization_ is not an official MLPerf metric. \ No newline at end of file +Note _utilization_ is not an official MLPerf metric.