New Features and Optimizations
-
Added context parallel (CP) support for SFT. CP requires you to prepare your dataset using NeMo's prepare_packed_ft_dataset.py script prior to training. Be sure to pass the context parallel size to this script, for example:
python scripts/nlp_language_modeling/prepare_packed_ft_dataset.py \ model.data.train_ds.file_names=[/path/to/training.jsonl] \ model.data.train_ds.max_seq_length=2048 \ +tokenizer_path=/path/to/tokenizer \ +output_dir=/path/to/output_folder \ +pack_sizes=[2048,4096,8192] \ model.context_parallel_size=2
CP can then be enabled in your training run by setting
model.context_parallel_size
in your config. Refer to the SFT documentation
for more details on runningprepare_packed_ft_dataset.py
and on running SFT with a packed dataset. -
Sequence packing is now supported when running DPO.
-
Added support for Knowledge Distillation with SFT. See the tutorial for details.
-
Added support for Megatron Core’s distributed optimizer, which can be configured using
++model.optim.name=mcore_distributed_optim
. -
Introduced
ScopedTimer
as a successor toSyncedTimer
.SyncedTimer
is marked for deprecation and will be removed in the next version.from nemo_aligner.utils.distributed import ScopedTimer timer = ScopedTimer() # All durations are logged in the timer with timer("step_time"): with timer("fwd"): model.fwd() with timer("bwd"): model.bwd() # Consume all durations and reset internal store durations = timer.consume_durations()
-
Add code and instructions for replicating Reward Modeling training in HelpSteer2 and HelpSteer2-Preference
-
Implement REINFORCE algorithm.
Breaking Changes
- Upgrade TRTLLM dependency from v0.10.0 to v0.12.0 and migrate from
GPTSession
cpp runtime toModelRunner
python runtime. Please use the latest Dockerfile. - Using latest TransformerEngine versions may require
++model.dist_ckpt_load_strictness=log_all
when loading from a older pre-existing checkpoint to not error out. - NeMo-Aligner now requires Megatron-LM==0.9.0 for the APIs to calculate the microbatch sizes (API introduced
megatron.core.num_microbatches_calculator.reconfigure_num_microbatch_calculator
). - NeMo-Aligner now requires a version of NeMo with this change to how the MoE spec is handled: NVIDIA/NeMo#9035 .
Bug Fixes
- It is now required, for stability, to add
export NCCL_ALGO=...
to scripts launching PPO training loop. Please see the RLHF docs for information.
Deprecation Notices
SyncedTimer
is marked for deprecation and will be removed in0.7.0
. Please switch toScopedTimer
broadcast_2d_tensor
andbroadcast_2d_tensor_within_pp
is marked for deprecation and will be removed in0.7.0
. Please switch tobroadcast_tensor
andbroadcast_tensor_within_pp
.