-
Notifications
You must be signed in to change notification settings - Fork 64
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
out of memory #58
Comments
你好,可以尝试使用tune_script_light中的脚本在lightning下进行训练,以更好地管理显存使用和分布式训练,在bfloat16精度下应该是可以24G单卡训练的 |
我使用了light中脚本运行时报错:AttributeError: 'GraphLlamaConfig' object has no attribute 'pretrain_graph_model_path' if model_args.graph_tower is not None:
self.model = GraphLlamaForCausalLM.from_pretrained(
model_args.model_name_or_path,
cache_dir=training_args.cache_dir,
**bnb_model_from_pretrained_args
) ## TODO: add real Graph Llama model
else:
self.model = transformers.LlamaForCausalLM.from_pretrained(
model_args.model_name_or_path,
cache_dir=training_args.cache_dir,
**bnb_model_from_pretrained_args
)
self.model.config.pretrain_graph_model_path = self.model.config.pretrain_graph_model_path + model_args.graph_tower 定位到最后一行报错的,请问这里是因为todo这里要修改吗?还是别的问题,怎么解决呢? |
可能是忘记在light的脚本中添加模型参数了 |
想请问一下,是否可以使用6卡4090进行分布式训练解决上述问题呢? |
在执行stag1.sh的时候,出现以下报错:
我看到了之前的issue中的回答,但并不清楚
--gpu
的参数应当加在哪里。我尝试加在stage1.sh中,报错说这个参数不是代码需要的参数。想详细了解这个指示多gpu load参数的指令该如何添加?谢谢。stage1.sh指令如下:
The text was updated successfully, but these errors were encountered: