-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unable to load falcon model #17
Comments
I had the same issue in Linux of model asking to rephrase the question. Regarding the messages about offloading some weights to disk, I assume it's related to this code: def load_peft_model():
peft_model_id = "DioulaD/falcon-7b-instruct-qlora-ge-dq-v2"
model = AutoModelForCausalLM.from_pretrained(
"tiiuae/falcon-7b-instruct",
torch_dtype=torch.bfloat16,
device_map="auto",
trust_remote_code=True,
) While the model is instantiated, device_map is set to auto. According to this article at HF, that means that the library will try to distribute the layers to make the most of your hardware. That means it will fill your GPU's VRAM, then your RAM, then disk. In my tests in Windows, Falcon took around 20-22 Gb after loading. So depending of the memory of your GPU (and how well do Falcon layers fit in that space) and your RAM, it might need some disk to fit the complete model. Have you checked if CUDA is being detected when loading the model? |
Just checked, cuda was not correctly detected when loading the model. |
Still being asked to rephrase the question. Checked the error and it was saying the model inputs_ids was on cuda and the model on cpu in inference. Adding model.to(device) to get_expectation function resolve this error but got another one lol. Cannot copy out of meta tensor; no data! Will trying to deep dive into |
On first run on windows 16Go, the model is succesfully downloded but running a query does not return the expected output. The model asks to rephrase the question. When trying to rerun the model, the following error appears
The error seems to indicate that the weights of the model are stored in a file on the computer, rather than in memory. Providing an offload_folder to the from_pretrained function seems to be the fix suggested in the error. But this throws a new error
The text was updated successfully, but these errors were encountered: