Replies: 1 comment 1 reply
-
所以能跑起来? |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
24G显存4090,运行预训练脚本
./run_pt.sh
,报OutOfMemoryError,已按说明去掉--modules_to_save和--gradient_checkpointing参数,block_size最小调整到4仍然报相同的错误,GPU内存也没有被其他程序占用。修改--modules_to_save和--gradient_checkpointing参数后LLaMa1可以正常完成预训练,但是LLaMa2(本项目)预训练会报错。
完整错误日志如下:
请问如何能让24G内存GPU跑起LLaMa2的预训练?谢谢回复。
Beta Was this translation helpful? Give feedback.
All reactions