Skip to content

Commit

Permalink
Merge pull request #304 from LLaVA-VL/yhzhang/llava_video_dev
Browse files Browse the repository at this point in the history
chore: Update training script for LLaVA-NeXT video models
  • Loading branch information
Luodian authored Oct 12, 2024
2 parents f071561 + e2ea8c4 commit 1a7e8b2
Show file tree
Hide file tree
Showing 3 changed files with 5 additions and 3 deletions.
2 changes: 1 addition & 1 deletion docs/LLaVA_Video_1003.md
Original file line number Diff line number Diff line change
Expand Up @@ -84,7 +84,7 @@ print(text_outputs)

## Training

[[Scripts]](/Users/zhangyuanhan/Desktop/LLaVA-NeXT/scripts/video/train): Start training models on your single-image/multi-image/video data.
[[Scripts]](https://github.com/LLaVA-VL/LLaVA-NeXT/blob/yhzhang/video_dev/scripts/video/train/SO400M_Qwen2_72B_ov_to_video_am9_aug6.sh): Start training models on your single-image/multi-image/video data.


## Evaluation Guidance
Expand Down
3 changes: 2 additions & 1 deletion scripts/video/train/SO400M_Qwen2_72B_ov_to_video_am9.sh
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,8 @@ echo "PREV_STAGE_CHECKPOINT: ${PREV_STAGE_CHECKPOINT}"
echo "MID_RUN_NAME: ${MID_RUN_NAME}"


ACCELERATE_CPU_AFFINITY=1 torchrun --nproc_per_node="${ARNOLD_WORKER_GPU}" --nnodes="${ARNOLD_WORKER_NUM}" --node_rank="${ARNOLD_ID}" --master_addr="${METIS_WORKER_0_HOST}" --master_port="${port_in_cmd}" \
# ACCELERATE_CPU_AFFINITY=1 torchrun --nproc_per_node="${ARNOLD_WORKER_GPU}" --nnodes="${ARNOLD_WORKER_NUM}" --node_rank="${ARNOLD_ID}" --master_addr="${METIS_WORKER_0_HOST}" --master_port="${port_in_cmd}" \
deepspeed --master_port 30000 \
llava/train/train_mem.py \
--deepspeed scripts/zero3.json \
--model_name_or_path $PREV_STAGE_CHECKPOINT \
Expand Down
3 changes: 2 additions & 1 deletion scripts/video/train/SO400M_Qwen2_7B_ov_to_video_am9.sh
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,8 @@ echo "PREV_STAGE_CHECKPOINT: ${PREV_STAGE_CHECKPOINT}"
echo "MID_RUN_NAME: ${MID_RUN_NAME}"


ACCELERATE_CPU_AFFINITY=1 torchrun --nproc_per_node="${ARNOLD_WORKER_GPU}" --nnodes="${ARNOLD_WORKER_NUM}" --node_rank="${ARNOLD_ID}" --master_addr="${METIS_WORKER_0_HOST}" --master_port="${port_in_cmd}" \
# ACCELERATE_CPU_AFFINITY=1 torchrun --nproc_per_node="${ARNOLD_WORKER_GPU}" --nnodes="${ARNOLD_WORKER_NUM}" --node_rank="${ARNOLD_ID}" --master_addr="${METIS_WORKER_0_HOST}" --master_port="${port_in_cmd}" \
deepspeed --master_port 30000 \
llava/train/train_mem.py \
--deepspeed scripts/zero3.json \
--model_name_or_path $PREV_STAGE_CHECKPOINT \
Expand Down

0 comments on commit 1a7e8b2

Please sign in to comment.