Replies: 3 comments
-
I'm trying to replicate on a small example but I'm failing to reproduce. Can you share more about the environment you're working in ? Python version, all libs versions, OS, etc,, ? Just a a quick side you could try;
This is what should happen under the hood, and there zero issue even if the memory mapping exceeds (by far) the available RAM. |
Beta Was this translation helpful? Give feedback.
0 replies
-
Wow, rechecked again because I had 2 reports, it seems it's
This forces torch to actually allocate the entire file. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
My VM has 32GB RAM and 2 x Nvidia Tesla V100 32GB.
I'm not able to load a safetensors file larger than my 32GB system RAM even if I have 64GB VRAM available. It seems like the memory is allocated in system RAM and afterwards the model is loaded on my GPUs if I use a smaler model.
Is there a way to directly allocate the memory on the GPUs? I want to load safetensors >32Gb and < 64GB
Beta Was this translation helpful? Give feedback.
All reactions