Release NVIDIA NeMo-Aligner 0.6.0 · NVIDIA/NeMo-Aligner

New Features and Optimizations

Added context parallel (CP) support for SFT. CP requires you to prepare your dataset using NeMo's prepare_packed_ft_dataset.py script prior to training. Be sure to pass the context parallel size to this script, for example:
```
python scripts/nlp_language_modeling/prepare_packed_ft_dataset.py \
   model.data.train_ds.file_names=[/path/to/training.jsonl] \
   model.data.train_ds.max_seq_length=2048 \
   +tokenizer_path=/path/to/tokenizer \
   +output_dir=/path/to/output_folder \
   +pack_sizes=[2048,4096,8192] \
   model.context_parallel_size=2
```
CP can then be enabled in your training run by setting model.context_parallel_size in your config. Refer to the SFT documentation
for more details on running prepare_packed_ft_dataset.py and on running SFT with a packed dataset.
Sequence packing is now supported when running DPO.
Added support for Knowledge Distillation with SFT. See the tutorial for details.
Added support for Megatron Core’s distributed optimizer, which can be configured using ++model.optim.name=mcore_distributed_optim.

Introduced ScopedTimer as a successor to SyncedTimer. SyncedTimer is marked for deprecation and will be removed in the next version.

from nemo_aligner.utils.distributed import ScopedTimer
timer = ScopedTimer()

# All durations are logged in the timer
with timer("step_time"):
    with timer("fwd"):
        model.fwd()
    with timer("bwd"):
        model.bwd()

# Consume all durations and reset internal store
durations = timer.consume_durations()

Add code and instructions for replicating Reward Modeling training in HelpSteer2 and HelpSteer2-Preference
Implement REINFORCE algorithm.

Breaking Changes

Upgrade TRTLLM dependency from v0.10.0 to v0.12.0 and migrate from GPTSession cpp runtime to ModelRunner python runtime. Please use the latest Dockerfile.
Using latest TransformerEngine versions may require ++model.dist_ckpt_load_strictness=log_all when loading from a older pre-existing checkpoint to not error out.
NeMo-Aligner now requires Megatron-LM==0.9.0 for the APIs to calculate the microbatch sizes (API introduced megatron.core.num_microbatches_calculator.reconfigure_num_microbatch_calculator).
NeMo-Aligner now requires a version of NeMo with this change to how the MoE spec is handled: NVIDIA/NeMo#9035 .

Bug Fixes

It is now required, for stability, to add export NCCL_ALGO=... to scripts launching PPO training loop. Please see the RLHF docs for information.

Deprecation Notices

SyncedTimer is marked for deprecation and will be removed in 0.7.0. Please switch to ScopedTimer
broadcast_2d_tensor and broadcast_2d_tensor_within_pp is marked for deprecation and will be removed in 0.7.0. Please switch to broadcast_tensor and broadcast_tensor_within_pp.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NVIDIA NeMo-Aligner 0.6.0

New Features and Optimizations

Breaking Changes

Bug Fixes

Deprecation Notices