Preference Optimization on Reasoning Traces (PORT)

This is the official code for the PORT: Preference Optimization on Reasoning Traces paper which proposes using preference optimization techniques to chain-of-thought reasoning steps in order to enhance the reasoning performances of language models.

Setup

For all of our experiments we used a node with 8 GPUS Nvidia A100 80GB Memory and the following conda env:

Create conda environment & activate

conda create -n port python=3.10
conda activate port

Install requirements

pip install -r requirements.txt

Install trl from source

pip install git+https://github.com/huggingface/trl.git

HF and W&B login

wandb login
huggingface-cli login --token <<YOUR_TOKEN>>

Usage

Data Generation (`data/`)

all_in_one.py used to generate full dataset used for all our fine-tuning experiments. This file is designed to be run on a node with 8 GPUs in order to generate 8 splits of the full dataset. To generate the data, use bash data_script.sh.
extraction.py this contain utils methods that help to extract the respective training dataset for SFT and DPO.

Supervised Fine-Tuning (`sft/`)

sft.py is modified version of HF official example to run the SFT experiment.
sft_script.sh contains the command to run the SFT experiment via execute bash sft_script.sh. This required to have the SFT model.

Preference (`preference/`)

dpo.py is modified version of HF official example for DPO & IPO experiments.
kto.py is modified version of HF official example for KTO experiment.
orpo.py is modified version of HF official example. To run the ORPO experiment.
o_script.sh contains the command to run the fine-tuning with above methods via execute bash o_script.sh. This required to have the SFT model.

Evaluation

To evaluate the fine-tuned models, we used the lm-evaluation-harness. Clone the official repo on the port/ directory:

git clone https://github.com/EleutherAI/lm-evaluation-harness
cd lm-evaluation-harness
pip install -e .

To evaluate on the GSM8K task directly, set your arguments and run:

accelerate launch -m lm_eval --model hf  --model_args pretrained=<<YOUR_BASE_PATH>>,peft=<<YOUR_FINETUNED_MODEL_PATH>> --tasks gsm8k --log_samples --output_path=dpo_results --wandb_args project=<<W&B_PROJECT_NAME>>,name=<<RUN_NAME>>

To evaluate on the AQUA-RAT CoT task, move the eval_scripts/aqua-rat-cot.yaml file to lm-evaluation-harness/lm-eval/task/agieval/, then set your arguments and directly run:

accelerate launch -m lm_eval --model hf  --model_args pretrained=<<YOUR_BASE_PATH>>,peft=<<YOUR_FINETUNED_MODEL_PATH>> --tasks agieval_aqua_rat_cot --log_samples --output_path=dpo_results --wandb_args project=<<W&B_PROJECT_NAME>>,name=<<RUN_NAME>>

To evaluate on the ARC Challenge CoT task, move the eval_scripts/arc_challenge_cot.yaml file to lm-evaluation-harness/lm-eval/task/arc/, then set your arguments and directly run:

accelerate launch -m lm_eval --model hf  --model_args pretrained=<<YOUR_BASE_PATH>>,peft=<<YOUR_FINETUNED_MODEL_PATH>> --tasks arc_challenge_cot --log_samples --output_path=dpo_results --wandb_args project=<<W&B_PROJECT_NAME>>,name=<<RUN_NAME>>

Citation

@article{lahlou2024port,
  title   = {PORT: Preference Optimization on Reasoning Traces},
  author  = {Salem Lahlou and Abdalgader Abubaker and Hakim Hacid},
  year    = {2024},
  journal = {arXiv preprint arXiv: 2406.16061}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Preference Optimization on Reasoning Traces (PORT)

Setup

Usage

Data Generation (`data/`)

Supervised Fine-Tuning (`sft/`)

Preference (`preference/`)

Evaluation

Citation

Files

README.md

Latest commit

History

README.md

File metadata and controls

Preference Optimization on Reasoning Traces (PORT)

Setup

Usage

Data Generation (data/)

Supervised Fine-Tuning (sft/)

Preference (preference/)

Evaluation

Citation

Data Generation (`data/`)

Supervised Fine-Tuning (`sft/`)

Preference (`preference/`)