From 2abb6ebaf22ae64087fa4127014587dd1e4d82f8 Mon Sep 17 00:00:00 2001
From: Hiwot Kassa <hiwot@umich.edu>
Date: Wed, 2 Oct 2024 20:55:46 -0700
Subject: [PATCH] fixed typo

---
 training_rules.adoc | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/training_rules.adoc b/training_rules.adoc
index 5c6470e..39f58da 100644
--- a/training_rules.adoc
+++ b/training_rules.adoc
@@ -276,7 +276,7 @@ The MLPerf verifier scripts checks all hyperparameters except those with names m
 |===
  |Model |Optimizer |Name |Constraint |Definition |Reference Code |Latest version available
 
-|bert |lamb |global_batch_size |unconstrained |The glboal batch size for training. |--train_batch_size |v4.1
+|bert |lamb |global_batch_size |unconstrained |The global batch size for training. |--train_batch_size |v4.1
  |bert |lamb |opt_base_learning_rate |unconstrained |The base learning rate. |--learning_rate |v4.1
  |bert |lamb |opt_epsilon |unconstrained |adam epsilon |link:https://github.com/mlperf/training/blob/fb058e3849c25f6c718434e60906ea3b0cb0f67d/language_model/tensorflow/bert/optimization.py#L75[reference code] |v4.1
  |bert |lamb |opt_learning_rate_training_steps |unconstrained |Step at which your reach the lowest learning late |link:https://github.com/mlperf/training/blob/master/language_model/tensorflow/bert/run_pretraining.py#L64[reference code] |v4.1
@@ -319,7 +319,7 @@ The MLPerf verifier scripts checks all hyperparameters except those with names m
  |llama2_70b_lora |adamw |opt_learning_rate_warmup_ratio | unconstrained |ratio of steps out of training for linear warmup during initial checkpoint generation. This only affects the learning rate curve in the benchmarking region. |See PR (From Habana, TODO Link) |v4.1
  |llama2_70b_lora |adamw |opt_learning_rate_training_steps | unconstrained |Step when the end of cosine learning rate curve is reached. Learning rate cosine decay is in range (opt_learning_rate_warmup_steps + 1,opt_learning_rate_decay_steps]. |See PR (From Habana, TODO Link) |v4.1
  |llama2_70b_lora |adamw |opt_base_learning_rate |unconstrained | base leraning rate |See PR (From Habana, TODO Link) |v4.1
- |stable diffusion |adamw |global_batch_size |unconstrained |The glboal batch size for training |link:https://github.com/mlcommons/training/blob/master/stable_diffusion/main.py#L633[reference code] |v4.1
+ |stable diffusion |adamw |global_batch_size |unconstrained |The global batch size for training |link:https://github.com/mlcommons/training/blob/master/stable_diffusion/main.py#L633[reference code] |v4.1
  |stable diffusion |adamw |opt_adamw_beta_1 |0.9 |coefficients used for computing running averages of gradient and its square |link:https://github.com/mlcommons/training/blob/master/stable_diffusion/ldm/models/diffusion/ddpm.py#L1629[reference code] |v4.1
  |stable diffusion |adamw |opt_adamw_beta_2 |0.999 |coefficients used for computing running averages of gradient and its square |link:https://github.com/mlcommons/training/blob/master/stable_diffusion/ldm/models/diffusion/ddpm.py#L1630[reference code] |v4.1
  |stable diffusion |adamw |opt_adamw_epsilon |1e-08 |term added to the denominator to improve numerical stability |link:https://github.com/mlcommons/training/blob/master/stable_diffusion/ldm/models/diffusion/ddpm.py#L1631[reference code] |v4.1
@@ -756,4 +756,4 @@ MLPerf recommends calculating _utilization_ as `model_tensor_flops / (peak_syste
 
 Use of `hardware_tensor_flops` (defined as model_tensor_flops plus operations added due to activation recomputation), instead of `model_tensor_flops` is strongly discouraged because those are not useful flops for the model. If `hardware_tensor_flops` are used for calculating utilization, it is recommended to also provide an accompanying calculation with `model_tensor_flops`.
 
-Note _utilization_ is not an official MLPerf metric.
\ No newline at end of file
+Note _utilization_ is not an official MLPerf metric.