Skip to content

NVIDIA NeMo-Aligner 0.6.0

Latest
Compare
Choose a tag to compare
@ko3n1g ko3n1g released this 07 Jan 23:06
6d3dee5

New Features and Optimizations

  • Added context parallel (CP) support for SFT. CP requires you to prepare your dataset using NeMo's prepare_packed_ft_dataset.py script prior to training. Be sure to pass the context parallel size to this script, for example:

    python scripts/nlp_language_modeling/prepare_packed_ft_dataset.py \
       model.data.train_ds.file_names=[/path/to/training.jsonl] \
       model.data.train_ds.max_seq_length=2048 \
       +tokenizer_path=/path/to/tokenizer \
       +output_dir=/path/to/output_folder \
       +pack_sizes=[2048,4096,8192] \
       model.context_parallel_size=2
    

    CP can then be enabled in your training run by setting model.context_parallel_size in your config. Refer to the SFT documentation
    for more details on running prepare_packed_ft_dataset.py and on running SFT with a packed dataset.

  • Sequence packing is now supported when running DPO.

  • Added support for Knowledge Distillation with SFT. See the tutorial for details.

  • Added support for Megatron Core’s distributed optimizer, which can be configured using ++model.optim.name=mcore_distributed_optim.

  • Introduced ScopedTimer as a successor to SyncedTimer. SyncedTimer is marked for deprecation and will be removed in the next version.

    from nemo_aligner.utils.distributed import ScopedTimer
    timer = ScopedTimer()
    
    # All durations are logged in the timer
    with timer("step_time"):
        with timer("fwd"):
            model.fwd()
        with timer("bwd"):
            model.bwd()
    
    # Consume all durations and reset internal store
    durations = timer.consume_durations()
  • Add code and instructions for replicating Reward Modeling training in HelpSteer2 and HelpSteer2-Preference

  • Implement REINFORCE algorithm.

Breaking Changes

  • Upgrade TRTLLM dependency from v0.10.0 to v0.12.0 and migrate from GPTSession cpp runtime to ModelRunner python runtime. Please use the latest Dockerfile.
  • Using latest TransformerEngine versions may require ++model.dist_ckpt_load_strictness=log_all when loading from a older pre-existing checkpoint to not error out.
  • NeMo-Aligner now requires Megatron-LM==0.9.0 for the APIs to calculate the microbatch sizes (API introduced megatron.core.num_microbatches_calculator.reconfigure_num_microbatch_calculator).
  • NeMo-Aligner now requires a version of NeMo with this change to how the MoE spec is handled: NVIDIA/NeMo#9035 .

Bug Fixes

  • It is now required, for stability, to add export NCCL_ALGO=... to scripts launching PPO training loop. Please see the RLHF docs for information.

Deprecation Notices

  • SyncedTimer is marked for deprecation and will be removed in 0.7.0. Please switch to ScopedTimer
  • broadcast_2d_tensor and broadcast_2d_tensor_within_pp is marked for deprecation and will be removed in 0.7.0. Please switch to broadcast_tensor and broadcast_tensor_within_pp.