Skip to content

Latest commit

 

History

History
78 lines (63 loc) · 3.71 KB

README.md

File metadata and controls

78 lines (63 loc) · 3.71 KB

Preference Optimization on Reasoning Traces (PORT)

This is the official code for the PORT: Preference Optimization on Reasoning Traces paper which proposes using preference optimization techniques to chain-of-thought reasoning steps in order to enhance the reasoning performances of language models.

alt text

Setup

For all of our experiments we used a node with 8 GPUS Nvidia A100 80GB Memory and the following conda env:

  • Create conda environment & activate
conda create -n port python=3.10
conda activate port
  • Install requirements
pip install -r requirements.txt
  • Install trl from source
pip install git+https://github.com/huggingface/trl.git
  • HF and W&B login
wandb login
huggingface-cli login --token <<YOUR_TOKEN>>

Usage

Data Generation (data/)

  • all_in_one.py used to generate full dataset used for all our fine-tuning experiments. This file is designed to be run on a node with 8 GPUs in order to generate 8 splits of the full dataset. To generate the data, use bash data_script.sh.
  • extraction.py this contain utils methods that help to extract the respective training dataset for SFT and DPO.

Supervised Fine-Tuning (sft/)

  • sft.py is modified version of HF official example to run the SFT experiment.
  • sft_script.sh contains the command to run the SFT experiment via execute bash sft_script.sh. This required to have the SFT model.

Preference (preference/)

  • dpo.py is modified version of HF official example for DPO & IPO experiments.
  • kto.py is modified version of HF official example for KTO experiment.
  • orpo.py is modified version of HF official example. To run the ORPO experiment.
  • o_script.sh contains the command to run the fine-tuning with above methods via execute bash o_script.sh. This required to have the SFT model.

Evaluation

To evaluate the fine-tuned models, we used the lm-evaluation-harness. Clone the official repo on the port/ directory:

git clone https://github.com/EleutherAI/lm-evaluation-harness
cd lm-evaluation-harness
pip install -e .
  • To evaluate on the GSM8K task directly, set your arguments and run:
accelerate launch -m lm_eval --model hf  --model_args pretrained=<<YOUR_BASE_PATH>>,peft=<<YOUR_FINETUNED_MODEL_PATH>> --tasks gsm8k --log_samples --output_path=dpo_results --wandb_args project=<<W&B_PROJECT_NAME>>,name=<<RUN_NAME>>
  • To evaluate on the AQUA-RAT CoT task, move the eval_scripts/aqua-rat-cot.yaml file to lm-evaluation-harness/lm-eval/task/agieval/, then set your arguments and directly run:
accelerate launch -m lm_eval --model hf  --model_args pretrained=<<YOUR_BASE_PATH>>,peft=<<YOUR_FINETUNED_MODEL_PATH>> --tasks agieval_aqua_rat_cot --log_samples --output_path=dpo_results --wandb_args project=<<W&B_PROJECT_NAME>>,name=<<RUN_NAME>>
  • To evaluate on the ARC Challenge CoT task, move the eval_scripts/arc_challenge_cot.yaml file to lm-evaluation-harness/lm-eval/task/arc/, then set your arguments and directly run:
accelerate launch -m lm_eval --model hf  --model_args pretrained=<<YOUR_BASE_PATH>>,peft=<<YOUR_FINETUNED_MODEL_PATH>> --tasks arc_challenge_cot --log_samples --output_path=dpo_results --wandb_args project=<<W&B_PROJECT_NAME>>,name=<<RUN_NAME>>

Citation

@article{lahlou2024port,
  title   = {PORT: Preference Optimization on Reasoning Traces},
  author  = {Salem Lahlou and Abdalgader Abubaker and Hakim Hacid},
  year    = {2024},
  journal = {arXiv preprint arXiv: 2406.16061}
}