Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to load falcon model #17

Open
BirdiD opened this issue Jul 18, 2023 · 3 comments
Open

Unable to load falcon model #17

BirdiD opened this issue Jul 18, 2023 · 3 comments

Comments

@BirdiD
Copy link
Owner

BirdiD commented Jul 18, 2023

On first run on windows 16Go, the model is succesfully downloded but running a query does not return the expected output. The model asks to rephrase the question. When trying to rerun the model, the following error appears

ValueError: The current 'device_map'  had weights offloaded to the disk. Please provide an 'offload_folder' for them. Alternatively, make sure you have 'safe tensors' installed if the model you are using offers weights in this format

The error seems to indicate that the weights of the model are stored in a file on the computer, rather than in memory. Providing an offload_folder to the from_pretrained function seems to be the fix suggested in the error. But this throws a new error

ValueError: We need an offload_dir to dispatch this model according to this 'device_map', the following submodules need to be offloaded: base_model.model.transformer.h.24, ........
@elsatch
Copy link
Collaborator

elsatch commented Jul 19, 2023

I had the same issue in Linux of model asking to rephrase the question.

Regarding the messages about offloading some weights to disk, I assume it's related to this code:

def load_peft_model():
    peft_model_id = "DioulaD/falcon-7b-instruct-qlora-ge-dq-v2"    
    model = AutoModelForCausalLM.from_pretrained(
            "tiiuae/falcon-7b-instruct",
            torch_dtype=torch.bfloat16,
            device_map="auto",
            trust_remote_code=True,
        )

While the model is instantiated, device_map is set to auto. According to this article at HF, that means that the library will try to distribute the layers to make the most of your hardware. That means it will fill your GPU's VRAM, then your RAM, then disk.

In my tests in Windows, Falcon took around 20-22 Gb after loading. So depending of the memory of your GPU (and how well do Falcon layers fit in that space) and your RAM, it might need some disk to fit the complete model.

Have you checked if CUDA is being detected when loading the model?

@BirdiD
Copy link
Owner Author

BirdiD commented Jul 19, 2023

Just checked, cuda was not correctly detected when loading the model.

@BirdiD
Copy link
Owner Author

BirdiD commented Jul 20, 2023

Still being asked to rephrase the question. Checked the error and it was saying the model inputs_ids was on cuda and the model on cpu in inference. Adding model.to(device) to get_expectation function resolve this error but got another one lol. Cannot copy out of meta tensor; no data! Will trying to deep dive into

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants