-
-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Continual Pretraining: Unexpected Trainable Parameters in PEFT Model #1578
Comments
@Erland366 Could you take a look at the Gemma 2 finetuning notebook and see if it functions fine thanks :) |
Wait I'll check it out |
I think there's no problem in the code Gemma2, Llama3.2, and Qwen has huge amount of vocab size. Therefore the When doing Smaller model or larger model of the same version usually has the same amount parameters for that It doesn't look as bad if you use smaller model with smaller vocab size such as TinyLlama like below : |
Hi @Erland366. That is also reflected in their memory usage. And have you tired changing the lora_r parameters to see if the number of trainable parameters increase/decrease. When I changed lora_r in bigger models i could see number if trainable parameters decreasing to 5% and 1% but in smaller models it seems to be stuck at 30%. |
I am not sure exactly why we need to saves both the But the print(model.base_model.model.model.embed_tokens.original_module.weight.device) # cpu
print(model.base_model.model.model.embed_tokens.modules_to_save.default.weight.device) # cuda:0 Yeah, increasing r increases the trainable params but it's very small. The |
Hi
I encountered unusual behavior while using the Unsloth continual pre-training notebook (https://unsloth.ai/blog/contpretraining) with small language models (1B-2B parameters).
I used the model.print_trainable_parameters() to get the number of trainable parameters for Gemma 2:
trainable params: 1,200,414,720 || all params: 3,814,756,608 || trainable%: 31.4677
After patching the model (e.g., gemma-2-2b) with PEFT adapters using
FastLanguageModel.get_peft_model
, the reported trainable parameter count remains high (~3B) despite using a rank of 16. This behavior persists even when changing lora_r (16, 32, 64) and with other small models (llama-3.2-1B, Qwen-2.5-1.5B). and only small modelsHowever, patching larger models (e.g., Mistral-7B-v0.1) results in the expected trainable parameters, like for Mistral-7B-v0.1 or any other models which have larger parameters the number flips to actual or expected scale:
trainable params: 41,943,040 || all params: 7,283,675,136 || trainable%: 0.5758
The Lora setting i used
Not sure what this behaviour is , any advice would be helpful.
Thank You.
The text was updated successfully, but these errors were encountered: