Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

issue of mixed-precision training. #34

Open
Bilibilee opened this issue Oct 1, 2024 · 0 comments
Open

issue of mixed-precision training. #34

Bilibilee opened this issue Oct 1, 2024 · 0 comments

Comments

@Bilibilee
Copy link

In the scripts script/MLLMSD_7b.sh and script/SmartEdit_7b.sh, you have specified --bf16 True, yet it seems that the corresponding deepspeed configuration in scripts/zero_mixed.json is missing the line "bf16": {"enabled": "auto"}. As a result, the --bf16 True flag does not appear to be taking effect. I would like to confirm whether this is a mistake or intentional.

Additionally, when training the MLLMSD 7b model, the logs indicate the following data types, which indicates that you have set some parts of the model to use float32 and other parts to use bfloat16.

1. model.vision_tower.dtype: torch.float32
2. model.mm_projector.dtype: torch.float32
3.1. model.model.model(LLaMA).embed_tokens.dtype: torch.float32
3.2. model.model.model(LLaMA).dtype: torch.bfloat16 torch.bfloat16
3.3. model.lm_head.dtype: torch.float32
4.1. model.sd_query_tokens.dtype: torch.float32
4.2. model.sd_qformer.dtype: torch.float32
5.1. model.vae.dtype: torch.bfloat16
5.2. model.unet.dtype: torch.float32

It is acceptable to me that the LLM model uses torch.bfloat16. However, I am curious as to why the VAE model, which has a relatively small number of parameters, is also set to use torch.bfloat16. Could there be a specific reason for this choice?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant