Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generalizable multi gpu to run e.g. Llama 65b #238

Open
wants to merge 45 commits into
base: main
Choose a base branch
from

Conversation

thejaminator
Copy link
Collaborator

@thejaminator thejaminator commented May 3, 2023

Try it out with e.g. 2 gpus.

elk elicit huggyllama/llama-65b imdb --num_gpus 2 --gpus_per_model 2 --int8 true

If you want to say "only use gpus that have 30gb available, you can pass min_gpu_mem as per normal

elk elicit huggyllama/llama-65b imdb --num_gpus 2 --gpus_per_model 2 --int8 true  --min_gpu_mem  32212254720  

on the cluster you may get this message. (i don't have perms to delete the lock file)

you can still try it out by passing other max examples params to bypass the cache

elk elicit huggyllama/llama-65b imdb --num_gpus 2 --gpus_per_model 2 --int8 true --max_examples 100 100
Traceback (most recent call last):
  File "/home/james/.conda/envs/elk/bin/elk", line 8, in <module>
    sys.exit(run())
  File "/mnt/ssd-2/spar/james/elk/elk/__main__.py", line 27, in run
    run.execute()
  File "/mnt/ssd-2/spar/james/elk/elk/__main__.py", line 19, in execute
    return self.command.execute()
  File "/mnt/ssd-2/spar/james/elk/elk/run.py", line 59, in execute
    self.datasets = [
  File "/mnt/ssd-2/spar/james/elk/elk/run.py", line 60, in <listcomp>
    extract(
  File "/mnt/ssd-2/spar/james/elk/elk/extraction/extraction.py", line 479, in extract
    builder.download_and_prepare(
  File "/home/james/.conda/envs/elk/lib/python3.10/site-packages/datasets/builder.py", line 811, in download_and_prepare
    with FileLock(lock_path) if is_local else contextlib.nullcontext():
  File "/home/james/.conda/envs/elk/lib/python3.10/site-packages/datasets/utils/filelock.py", line 320, in __enter__
    self.acquire()
  File "/home/james/.conda/envs/elk/lib/python3.10/site-packages/datasets/utils/filelock.py", line 270, in acquire
    self._acquire()
  File "/home/james/.conda/envs/elk/lib/python3.10/site-packages/datasets/utils/filelock.py", line 404, in _acquire
    fd = os.open(self._lock_file, open_mode)
PermissionError: [Errno 13] Permission denied: '/mnt/ssd-2/hf_cache/generator/default-2e014cbd8695f82d/0.0.0_builder.lock'

model_devices=device_config,
verbose=is_verbose,
)
)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the device map can be different if e.g., gpus 1-7 are all empty. But gpu 0 happens to have someone else's 5 gb model there.
Suppose you have 4 workers (2 gpus each)

The first worker will have a different device map from the rest, since it'll take into account the 5gb model memory being used.

The device map can sometimes heavily affect performance. In my experience when u can unlucky like (1.5x slower?). This is because you maybe you'll split the model in a suboptimal spot. Like exactly where the model is going to send much more tensors between the layers.

So ideally all the workers would use the same device map so you can approximate the time take to process the dataset better.

}
if use_8bit
else max_memory_used_devices
)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the TLDR is that we need to tell infer_auto_device_map that we want to use {"cuda:0": 30gb, "cuda:1": 40gb} and NO OTHER GPUS OR CPU OR DISK

@thejaminator thejaminator force-pushed the generalizable-multi-gpu branch from 5299180 to 8c6386c Compare May 3, 2023 16:38
@thejaminator thejaminator force-pushed the generalizable-multi-gpu branch from 7194faa to df1c0ff Compare May 3, 2023 17:34
@thejaminator thejaminator force-pushed the generalizable-multi-gpu branch from 9bf52eb to 6d9e9ea Compare May 3, 2023 17:37
@thejaminator thejaminator force-pushed the generalizable-multi-gpu branch from d108680 to 02602cb Compare May 3, 2023 17:48
@thejaminator thejaminator force-pushed the generalizable-multi-gpu branch from b02b2d5 to d3a8f29 Compare May 3, 2023 17:51
@thejaminator thejaminator force-pushed the generalizable-multi-gpu branch from 18357c2 to bf827ea Compare May 3, 2023 17:57
@thejaminator thejaminator changed the title Generalizable multi gpu wip: Generalizable multi gpu May 3, 2023
@thejaminator thejaminator force-pushed the generalizable-multi-gpu branch from 3abb5fe to 301e6e2 Compare May 3, 2023 18:05
@thejaminator thejaminator force-pushed the generalizable-multi-gpu branch from 55490ed to a5b3d5f Compare May 3, 2023 18:10
@thejaminator thejaminator force-pushed the generalizable-multi-gpu branch from 3e9b96d to 99db2a0 Compare May 3, 2023 18:19
@thejaminator thejaminator force-pushed the generalizable-multi-gpu branch from db7acbf to 55b18ab Compare May 3, 2023 18:25
@thejaminator thejaminator changed the title wip: Generalizable multi gpu Generalizable multi gpu May 3, 2023
**kwargs,
) -> PreTrainedModel:
"""Instantiate a model string with the appropriate `Auto` class."""
device = torch.device(device)
kwargs["device_map"] = {"": device}
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kwargs["device_map"] = {"": device} will be passed by the caller instead (because for e.g. when instantiating an empty model, we can't pass a device map. otherwise it'll really load the weights and won't be an empty model anymore

@thejaminator thejaminator changed the title Generalizable multi gpu Generalizable multi gpu to run e.g. Llama 65b May 6, 2023
@thejaminator thejaminator requested a review from norabelrose May 6, 2023 06:12
# If a torch_dtype was not specified, try to infer it.
kwargs["torch_dtype"] = torch_dtype or determine_dtypes(
model_str=model_str, is_cpu=is_cpu, load_in_8bit=load_in_8bit
)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

made this change because it previously was setting kwargs even if it was getting passed by the caller of instantiate_model, which confused me

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant