Deadlock and CUDA OOM Issues with finetune_docowl.sh Using DeepSpeed Zero 2 and Zero 3 #114

cryingjin · 2024-10-08T04:33:26Z

Hello. Thank you very much for sharing such great results. I really want to fine-tune and use this model.

As far as I have understood so far, when running finetune_docowl.sh, there is a deadlock issue with DeepSpeed stage 3 and 3-offload (zero 3, zero-offload), and it seems to be the same with zero2 and 3 of finetune_docowl_lora.sh.

Currently, I haven't been able to use finetune_docowl.sh (w/ zero2) due to a CUDA OOM issue.

Am I understanding this correctly? If you have resolved any of these deadlock issues, please share.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Deadlock and CUDA OOM Issues with finetune_docowl.sh Using DeepSpeed Zero 2 and Zero 3 #114

Deadlock and CUDA OOM Issues with finetune_docowl.sh Using DeepSpeed Zero 2 and Zero 3 #114

cryingjin commented Oct 8, 2024 •

edited

Loading

Deadlock and CUDA OOM Issues with finetune_docowl.sh Using DeepSpeed Zero 2 and Zero 3 #114

Deadlock and CUDA OOM Issues with finetune_docowl.sh Using DeepSpeed Zero 2 and Zero 3 #114

Comments

cryingjin commented Oct 8, 2024 • edited Loading

cryingjin commented Oct 8, 2024 •

edited

Loading