Unable to load falcon model #17

BirdiD · 2023-07-18T23:27:18Z

On first run on windows 16Go, the model is succesfully downloded but running a query does not return the expected output. The model asks to rephrase the question. When trying to rerun the model, the following error appears

ValueError: The current 'device_map'  had weights offloaded to the disk. Please provide an 'offload_folder' for them. Alternatively, make sure you have 'safe tensors' installed if the model you are using offers weights in this format

The error seems to indicate that the weights of the model are stored in a file on the computer, rather than in memory. Providing an offload_folder to the from_pretrained function seems to be the fix suggested in the error. But this throws a new error

ValueError: We need an offload_dir to dispatch this model according to this 'device_map', the following submodules need to be offloaded: base_model.model.transformer.h.24, ........

The text was updated successfully, but these errors were encountered:

elsatch · 2023-07-19T00:13:07Z

I had the same issue in Linux of model asking to rephrase the question.

Regarding the messages about offloading some weights to disk, I assume it's related to this code:

def load_peft_model():
    peft_model_id = "DioulaD/falcon-7b-instruct-qlora-ge-dq-v2"    
    model = AutoModelForCausalLM.from_pretrained(
            "tiiuae/falcon-7b-instruct",
            torch_dtype=torch.bfloat16,
            device_map="auto",
            trust_remote_code=True,
        )

While the model is instantiated, device_map is set to auto. According to this article at HF, that means that the library will try to distribute the layers to make the most of your hardware. That means it will fill your GPU's VRAM, then your RAM, then disk.

In my tests in Windows, Falcon took around 20-22 Gb after loading. So depending of the memory of your GPU (and how well do Falcon layers fit in that space) and your RAM, it might need some disk to fit the complete model.

Have you checked if CUDA is being detected when loading the model?

BirdiD · 2023-07-19T18:56:31Z

Just checked, cuda was not correctly detected when loading the model.

BirdiD · 2023-07-20T17:50:40Z

Still being asked to rephrase the question. Checked the error and it was saying the model inputs_ids was on cuda and the model on cpu in inference. Adding model.to(device) to get_expectation function resolve this error but got another one lol. Cannot copy out of meta tensor; no data! Will trying to deep dive into

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unable to load falcon model #17

Unable to load falcon model #17

BirdiD commented Jul 18, 2023

elsatch commented Jul 19, 2023

BirdiD commented Jul 19, 2023

BirdiD commented Jul 20, 2023

Unable to load falcon model #17

Unable to load falcon model #17

Comments

BirdiD commented Jul 18, 2023

elsatch commented Jul 19, 2023

BirdiD commented Jul 19, 2023

BirdiD commented Jul 20, 2023