Generalizable multi gpu to run e.g. Llama 65b #238

thejaminator · 2023-05-03T03:37:50Z

Try it out with e.g. 2 gpus.

elk elicit huggyllama/llama-65b imdb --num_gpus 2 --gpus_per_model 2 --int8 true

If you want to say "only use gpus that have 30gb available, you can pass min_gpu_mem as per normal

elk elicit huggyllama/llama-65b imdb --num_gpus 2 --gpus_per_model 2 --int8 true  --min_gpu_mem  32212254720

on the cluster you may get this message. (i don't have perms to delete the lock file)

you can still try it out by passing other max examples params to bypass the cache

elk elicit huggyllama/llama-65b imdb --num_gpus 2 --gpus_per_model 2 --int8 true --max_examples 100 100

Traceback (most recent call last):
  File "/home/james/.conda/envs/elk/bin/elk", line 8, in <module>
    sys.exit(run())
  File "/mnt/ssd-2/spar/james/elk/elk/__main__.py", line 27, in run
    run.execute()
  File "/mnt/ssd-2/spar/james/elk/elk/__main__.py", line 19, in execute
    return self.command.execute()
  File "/mnt/ssd-2/spar/james/elk/elk/run.py", line 59, in execute
    self.datasets = [
  File "/mnt/ssd-2/spar/james/elk/elk/run.py", line 60, in <listcomp>
    extract(
  File "/mnt/ssd-2/spar/james/elk/elk/extraction/extraction.py", line 479, in extract
    builder.download_and_prepare(
  File "/home/james/.conda/envs/elk/lib/python3.10/site-packages/datasets/builder.py", line 811, in download_and_prepare
    with FileLock(lock_path) if is_local else contextlib.nullcontext():
  File "/home/james/.conda/envs/elk/lib/python3.10/site-packages/datasets/utils/filelock.py", line 320, in __enter__
    self.acquire()
  File "/home/james/.conda/envs/elk/lib/python3.10/site-packages/datasets/utils/filelock.py", line 270, in acquire
    self._acquire()
  File "/home/james/.conda/envs/elk/lib/python3.10/site-packages/datasets/utils/filelock.py", line 404, in _acquire
    fd = os.open(self._lock_file, open_mode)
PermissionError: [Errno 13] Permission denied: '/mnt/ssd-2/hf_cache/generator/default-2e014cbd8695f82d/0.0.0_builder.lock'

for more information, see https://pre-commit.ci

thejaminator · 2023-05-03T16:07:35Z

elk/utils/multi_gpu.py

+            model_devices=device_config,
+            verbose=is_verbose,
+        )
+    )


the device map can be different if e.g., gpus 1-7 are all empty. But gpu 0 happens to have someone else's 5 gb model there.
Suppose you have 4 workers (2 gpus each)

The first worker will have a different device map from the rest, since it'll take into account the 5gb model memory being used.

The device map can sometimes heavily affect performance. In my experience when u can unlucky like (1.5x slower?). This is because you maybe you'll split the model in a suboptimal spot. Like exactly where the model is going to send much more tensors between the layers.

So ideally all the workers would use the same device map so you can approximate the time take to process the dataset better.

thejaminator · 2023-05-03T16:09:04Z

elk/utils/multi_gpu.py

+        }
+        if use_8bit
+        else max_memory_used_devices
+    )


the TLDR is that we need to tell infer_auto_device_map that we want to use {"cuda:0": 30gb, "cuda:1": 40gb} and NO OTHER GPUS OR CPU OR DISK

This reverts commit 301e6e2.

for more information, see https://pre-commit.ci

thejaminator · 2023-05-03T18:28:05Z

elk/utils/hf_utils.py

    **kwargs,
 ) -> PreTrainedModel:
    """Instantiate a model string with the appropriate `Auto` class."""
-    device = torch.device(device)
-    kwargs["device_map"] = {"": device}


kwargs["device_map"] = {"": device} will be passed by the caller instead (because for e.g. when instantiating an empty model, we can't pass a device map. otherwise it'll really load the weights and won't be an empty model anymore

thejaminator · 2023-05-06T06:14:22Z

elk/utils/hf_utils.py

+        # If a torch_dtype was not specified, try to infer it.
+        kwargs["torch_dtype"] = torch_dtype or determine_dtypes(
+            model_str=model_str, is_cpu=is_cpu, load_in_8bit=load_in_8bit
+        )


made this change because it previously was setting kwargs even if it was getting passed by the caller of instantiate_model, which confused me

thejaminator and others added 30 commits May 1, 2023 19:26

add llama map

79a0be0

add typechecking if

d692b7f

allocate the device properly

91d98d2

print to debug

aa5ee9e

change to device config

7cbfb9a

more logs

cfb3200

print the value

8cb4dec

fix not returning configs

9e8e321

[pre-commit.ci] auto fixes from pre-commit.com hooks

36ea2e6

for more information, see https://pre-commit.ci

test the effect of not returning the past key values

a59d0a2

[pre-commit.ci] auto fixes from pre-commit.com hooks

5fc1b5f

for more information, see https://pre-commit.ci

add kwargs

c2ee397

Merge remote-tracking branch 'origin/main' into hardcoded-llama65-map

6a8de4d

add device map 0

6abddbf

[pre-commit.ci] auto fixes from pre-commit.com hooks

4953544

for more information, see https://pre-commit.ci

fix 8bit mem

a2ceeb9

make pyright happy

5409b01

implement multi gpu

0db53b9

Merge remote-tracking branch 'origin/main' into generalizable-multi-gpu

50edafe

add cli

5ee3c3a

redirect only later

8d6279c

add logs and remove llama

47529b6

fix keyword

4a49aa0

try out lm head

91a06b6

shift it to 0.8 instead

d74c9b6

try hardcoded map

e6eb9c1

decrease further for gpu 1

06b1a11

fix import

64919b3

remove syntax

fe331bc

try comparing to hardcoding

0ed7f31

thejaminator commented May 3, 2023

View reviewed changes

thejaminator force-pushed the generalizable-multi-gpu branch from 5299180 to 8c6386c Compare May 3, 2023 16:38

thejaminator added 2 commits May 4, 2023 00:39

print

0182a64

load in 8bit correctly

df1c0ff

thejaminator force-pushed the generalizable-multi-gpu branch from 7194faa to df1c0ff Compare May 3, 2023 17:34

add comment

6d9e9ea

thejaminator force-pushed the generalizable-multi-gpu branch from 9bf52eb to 6d9e9ea Compare May 3, 2023 17:37

try passing float16?

02602cb

thejaminator force-pushed the generalizable-multi-gpu branch from d108680 to 02602cb Compare May 3, 2023 17:48

prevent mem issues?

d3a8f29

thejaminator force-pushed the generalizable-multi-gpu branch from b02b2d5 to d3a8f29 Compare May 3, 2023 17:51

add logs

bf827ea

thejaminator force-pushed the generalizable-multi-gpu branch from 18357c2 to bf827ea Compare May 3, 2023 17:57

try only adding load_in_8bit if we really need to

301e6e2

thejaminator changed the title ~~Generalizable multi gpu~~ wip: Generalizable multi gpu May 3, 2023

thejaminator force-pushed the generalizable-multi-gpu branch from 3abb5fe to 301e6e2 Compare May 3, 2023 18:05

thejaminator added 2 commits May 4, 2023 02:09

catch max mem

6b6bb6f

Revert "try only adding load_in_8bit if we really need to"

a5b3d5f

This reverts commit 301e6e2.

thejaminator force-pushed the generalizable-multi-gpu branch from 55490ed to a5b3d5f Compare May 3, 2023 18:10

try out means of memory

99db2a0

thejaminator force-pushed the generalizable-multi-gpu branch from 3e9b96d to 99db2a0 Compare May 3, 2023 18:19

remove debug print

55b18ab

thejaminator force-pushed the generalizable-multi-gpu branch from db7acbf to 55b18ab Compare May 3, 2023 18:25

[pre-commit.ci] auto fixes from pre-commit.com hooks

fa52400

for more information, see https://pre-commit.ci

thejaminator changed the title ~~wip: Generalizable multi gpu~~ Generalizable multi gpu May 3, 2023

thejaminator commented May 3, 2023

View reviewed changes

thejaminator changed the title ~~Generalizable multi gpu~~ Generalizable multi gpu to run e.g. Llama 65b May 6, 2023

thejaminator requested a review from norabelrose May 6, 2023 06:12

thejaminator commented May 6, 2023

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Generalizable multi gpu to run e.g. Llama 65b #238

Generalizable multi gpu to run e.g. Llama 65b #238

thejaminator commented May 3, 2023 •

edited

Loading

thejaminator May 3, 2023

thejaminator May 3, 2023

thejaminator May 3, 2023

thejaminator May 6, 2023

Generalizable multi gpu to run e.g. Llama 65b #238

Are you sure you want to change the base?

Generalizable multi gpu to run e.g. Llama 65b #238

Conversation

thejaminator commented May 3, 2023 • edited Loading

thejaminator May 3, 2023

Choose a reason for hiding this comment

thejaminator May 3, 2023

Choose a reason for hiding this comment

thejaminator May 3, 2023

Choose a reason for hiding this comment

thejaminator May 6, 2023

Choose a reason for hiding this comment

thejaminator commented May 3, 2023 •

edited

Loading