Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RuntimeError During Merging Process Possibly Due to Shared Memory Tensors #41

Open
SolshineCode opened this issue Oct 20, 2024 · 13 comments

Comments

@SolshineCode
Copy link

Description:
I'm encountering an error while trying to merge models using the merge.py script. The process loads the models and processes the layers correctly, but when it attempts to save the merged model, a RuntimeError is raised due to tensors sharing memory. Here's the detailed log:

This issue regarding when running following command on notebook

!python dam/merge.py \
  "cerebras/Cerebras-GPT-111M" \
  "cerebras/Cerebras-GPT-111M" "Corianas/111m" \
  --output_path "/content/merged_model" \
  --device "cuda" \
  --repo_id "Solshine/Cerebras-GPT-111M-DAM-test-untrained-merge"

Log Output:

Loading base model: cerebras/Cerebras-GPT-111M
Loading models to merge:
Loading models: 100% 2/2 [00:01<00:00,  1.25it/s]
/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_base.py:1601: FutureWarning: `clean_up_tokenization_spaces` was not set. It will be set to `True` by default. This behavior will be depracted in transformers v4.45, and will be then set to `False` by default. For more details check this issue: https://github.com/huggingface/transformers/issues/31884
  warnings.warn(
Processing layer norms: 0it [00:00, ?it/s]
Processing embedding layers: 100% 2/2 [00:00<00:00, 4029.11it/s]
Processing linear layers: 100% 1/1 [00:00<00:00,  3.68it/s]
Total number of parameters: 226845696
Total number of trainable parameters: 72454656
Saving merged model to /content/merged_model
Traceback (most recent call last):
  File "/content/DAM/DAM/dam/merge.py", line 267, in <module>
    main()
  File "/content/DAM/DAM/dam/merge.py", line 252, in main
    merge_models(args.base_model_id, 
  File "/content/DAM/DAM/dam/merge.py", line 215, in merge_models
    merged_model.save_pretrained(output_path)
  File "/usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py", line 2793, in save_pretrained
    safe_save_file(shard, os.path.join(save_directory, shard_file), metadata={"format": "pt"})
  File "/usr/local/lib/python3.10/dist-packages/safetensors/torch.py", line 286, in save_file
    serialize_file(_flatten(tensors), filename, metadata=metadata)
  File "/usr/local/lib/python3.10/dist-packages/safetensors/torch.py", line 488, in _flatten
    raise RuntimeError(
RuntimeError: 
            Some tensors share memory, this will lead to duplicate memory on disk and potential differences when loading them again: [{'transformer.wte.embeddings.0', 'lm_head.weights.0'}, {'transformer.wte.embeddings.1', 'lm_head.weights.1'}].
            A potential way to correctly save your model is to use `save_model`.
            More information at https://huggingface.co/docs/safetensors/torch_shared_tensors

Reproduction Steps:

  1. Run merge.py script with the following parameters:
    !python dam/merge.py \
      "cerebras/Cerebras-GPT-111M" \
      "cerebras/Cerebras-GPT-111M" "Corianas/111m" \
      --output_path "/content/merged_model" \
      --device "cuda" \
      --repo_id "Solshine/DAM-test-untrained-merge"
  2. The error occurs during the save_pretrained() call when the merged model is being saved.

Expected Behavior:
The merged model should save correctly without errors.

Actual Behavior:
The process fails during the save step due to the model having tensors that share memory. The error suggests using save_model to handle shared tensors more appropriately.

Troubleshooting Attempts:

  • I tried reloading the model and saving it using PyTorch's torch.save() instead of safetensors, which worked for saving but doesn’t resolve the root issue with merge.py.
  • I tried examining the contents of the merged_model folder (which is created when the command runs) and found only the config and the generation_config jsons.
  • It seems based on the logs that the lm_head.weights and transformer.wte.embeddings may be the shared tensors causing the problem.
  • Different models attempted with same / similar issue ran into (llama 3.2 1B)

Request for Help:

  • Can you provide a walkthrough of how to handle this error within the repository's framework?
  • Should we modify the merge process to avoid shared tensors, or is there an alternative save method that can handle this correctly?

Any guidance or suggestions to resolve this issue would be greatly appreciated!

Thank you for your time and help!

@SolshineCode SolshineCode changed the title RuntimeError During Merging Process Due to Shared Memory Tensors RuntimeError During Merging Process Possibly Due to Shared Memory Tensors Oct 20, 2024
@SolshineCode
Copy link
Author

This also happens with Qwen/Qwen2.5-0.5B

Loading base model: Qwen/Qwen2.5-0.5B
config.json: 100% 681/681 [00:00<00:00, 4.94MB/s]
model.safetensors: 100% 988M/988M [00:06<00:00, 146MB/s]
generation_config.json: 100% 138/138 [00:00<00:00, 1.02MB/s]
Loading models to merge:
Loading models:  50% 1/2 [00:01<00:01,  1.27s/it]
config.json: 100% 729/729 [00:00<00:00, 4.58MB/s]

model.safetensors:   0% 0.00/988M [00:00<?, ?B/s]
model.safetensors:   1% 10.5M/988M [00:00<00:12, 78.9MB/s]
model.safetensors:   3% 31.5M/988M [00:00<00:07, 128MB/s] 
model.safetensors:   6% 62.9M/988M [00:00<00:05, 181MB/s]
model.safetensors:   8% 83.9M/988M [00:00<00:07, 117MB/s]
model.safetensors:  13% 126M/988M [00:00<00:05, 168MB/s] 
model.safetensors:  15% 147M/988M [00:00<00:05, 168MB/s]
model.safetensors:  18% 178M/988M [00:01<00:04, 188MB/s]
model.safetensors:  21% 210M/988M [00:01<00:03, 211MB/s]
model.safetensors:  24% 241M/988M [00:01<00:03, 226MB/s]
model.safetensors:  28% 273M/988M [00:01<00:03, 235MB/s]
model.safetensors:  31% 304M/988M [00:01<00:02, 238MB/s]
model.safetensors:  34% 336M/988M [00:01<00:02, 241MB/s]
model.safetensors:  37% 367M/988M [00:01<00:02, 243MB/s]
model.safetensors:  40% 398M/988M [00:01<00:02, 243MB/s]
model.safetensors:  44% 430M/988M [00:02<00:02, 242MB/s]
model.safetensors:  47% 461M/988M [00:02<00:02, 239MB/s]
model.safetensors:  50% 493M/988M [00:02<00:02, 243MB/s]
model.safetensors:  53% 524M/988M [00:02<00:01, 244MB/s]
model.safetensors:  56% 556M/988M [00:02<00:01, 233MB/s]
model.safetensors:  59% 587M/988M [00:02<00:01, 247MB/s]
model.safetensors:  63% 619M/988M [00:02<00:01, 240MB/s]
model.safetensors:  66% 650M/988M [00:02<00:01, 240MB/s]
model.safetensors:  69% 682M/988M [00:03<00:01, 250MB/s]
model.safetensors:  72% 713M/988M [00:03<00:01, 247MB/s]
model.safetensors:  75% 744M/988M [00:03<00:01, 240MB/s]
model.safetensors:  79% 776M/988M [00:03<00:00, 246MB/s]
model.safetensors:  82% 807M/988M [00:03<00:00, 243MB/s]
model.safetensors:  85% 839M/988M [00:03<00:00, 248MB/s]
model.safetensors:  88% 870M/988M [00:03<00:00, 249MB/s]
model.safetensors:  91% 902M/988M [00:04<00:00, 245MB/s]
model.safetensors:  94% 933M/988M [00:04<00:00, 245MB/s]
model.safetensors: 100% 988M/988M [00:04<00:00, 226MB/s]

generation_config.json: 100% 117/117 [00:00<00:00, 710kB/s]
Loading models: 100% 2/2 [00:07<00:00,  3.81s/it]
tokenizer_config.json: 100% 7.23k/7.23k [00:00<00:00, 38.7MB/s]
vocab.json: 100% 2.78M/2.78M [00:00<00:00, 10.6MB/s]
merges.txt: 100% 1.67M/1.67M [00:00<00:00, 23.2MB/s]
tokenizer.json: 100% 7.03M/7.03M [00:00<00:00, 19.3MB/s]
Processing layer norms: 0it [00:00, ?it/s]
Processing embedding layers: 100% 2/2 [00:00<00:00, 19784.45it/s]
Processing linear layers: 100% 169/169 [00:01<00:00, 121.07it/s]
Total number of parameters: 1260786192
Total number of trainable parameters: 537360
Saving merged model to /content/merged_model
Traceback (most recent call last):
  File "/content/DAM/dam/merge.py", line 267, in <module>
    main()
  File "/content/DAM/dam/merge.py", line 252, in main
    merge_models(args.base_model_id, 
  File "/content/DAM/dam/merge.py", line 215, in merge_models
    merged_model.save_pretrained(output_path)
  File "/usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py", line 2793, in save_pretrained
    safe_save_file(shard, os.path.join(save_directory, shard_file), metadata={"format": "pt"})
  File "/usr/local/lib/python3.10/dist-packages/safetensors/torch.py", line 286, in save_file
    serialize_file(_flatten(tensors), filename, metadata=metadata)
  File "/usr/local/lib/python3.10/dist-packages/safetensors/torch.py", line 488, in _flatten
    raise RuntimeError(
RuntimeError: 
            Some tensors share memory, this will lead to duplicate memory on disk and potential differences when loading them again: [{'lm_head.weights.0', 'model.embed_tokens.embeddings.0'}, {'model.embed_tokens.embeddings.1', 'lm_head.weights.1'}].
            A potential way to correctly save your model is to use `save_model`.
            More information at https://huggingface.co/docs/safetensors/torch_shared_tensors

@shamanez
Copy link
Member

Hi @SolshineCode, thanks for trying our work. I think the problem is with the modeling files. As you can see, we support Mistral and Llama3 at the moment. But adding support to other models is straightforward.

@thomasgauthier could you please take a look at this further?

@SolshineCode
Copy link
Author

@shamanez I really appreciate your quick reply. This is an awesome program and I'm excited to use it further.

I've replicated the error and issue with the meta-llama/Llama-3.2-1B-Instruct model, so I believe this issue also occurs with Llama Architecture.

Error Log Snippit:

...
model.safetensors: 100% 2.47G/2.47G [00:30<00:00, 80.2MB/s]

generation_config.json: 100% 234/234 [00:00<00:00, 1.70MB/s]
Loading models: 100% 2/2 [00:37<00:00, 18.90s/it]
tokenizer_config.json: 100% 54.5k/54.5k [00:00<00:00, 809kB/s]
tokenizer.json: 100% 9.09M/9.09M [00:00<00:00, 20.7MB/s]
special_tokens_map.json: 100% 296/296 [00:00<00:00, 1.83MB/s]
Processing layer norms: 100% 33/33 [00:00<00:00, 788.86it/s]
Processing embedding layers: 100% 2/2 [00:00<00:00, 19737.90it/s]
Processing linear layers: 100% 113/113 [00:04<00:00, 27.47it/s]
Total number of parameters: 2997764096
Total number of trainable parameters: 659456
Saving merged model to /content/merged_model
Traceback (most recent call last):
  File "/content/DAM/dam/merge.py", line 267, in <module>
    main()
  File "/content/DAM/dam/merge.py", line 252, in main
    merge_models(args.base_model_id, 
  File "/content/DAM/dam/merge.py", line 215, in merge_models
    merged_model.save_pretrained(output_path)
  File "/usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py", line 2793, in save_pretrained
    safe_save_file(shard, os.path.join(save_directory, shard_file), metadata={"format": "pt"})
  File "/usr/local/lib/python3.10/dist-packages/safetensors/torch.py", line 286, in save_file
    serialize_file(_flatten(tensors), filename, metadata=metadata)
  File "/usr/local/lib/python3.10/dist-packages/safetensors/torch.py", line 488, in _flatten
    raise RuntimeError(
RuntimeError: 
            Some tensors share memory, this will lead to duplicate memory on disk and potential differences when loading them again: [{'lm_head.weights.0', 'model.embed_tokens.embeddings.0'}, {'model.embed_tokens.embeddings.1', 'lm_head.weights.1'}].
            A potential way to correctly save your model is to use `save_model`.
            More information at https://huggingface.co/docs/safetensors/torch_shared_tensors

And the saved merged_model folder again only contains the two config jsons.

Image

@shamanez
Copy link
Member

shamanez commented Oct 21, 2024

Can you please try this command?

python dam/merge.py mistralai/Mistral-7B-v0.1 augmxnt/shisa-gamma-7b-v1 WizardLM/WizardMath-7B-V1.1 arcee-train/Abel-7B-002-truncated-embeds --device cuda --output_path ./merged_model --repo_id arcee-train/[prefix]-untrained-merge

**Also wanted this merge operation to happen on CPUs. I think there's a little bug. But can you please remove the "cuda" command as well?

Because this operation doesn't needa GPU.**

Then adding to this, the logic behind adding a new model is here - https://github.com/arcee-ai/DAM/blob/main/dam/merge.py#L21

@SolshineCode
Copy link
Author

SolshineCode commented Oct 21, 2024

I tried that in this context but it seems that's too large an operation for the Google Colab notebook I'm using, so I'll have to check it out when I'm back at a real PC.

I was hoping to make and release a notebook that works to do DAM for tiny models on the colab free t4 alotment. It quits out at "pytorch_model-00001-of-00002.bin: 49% 4.83G/9.94G [03:18<03:28, 24.5MB/s]" of the base model download.

This DAM project is incredibly cool and reinvigorates my faith in a distributed polysemanticity interpretations for neural network interpretability. Great work!

I'll take a look at the logic behind adding a new model and may make a PR for the newer llama models if I can figure it out. It would be good to be able to use this on SOTA tiny LLM architectures. Thanks!

@SolshineCode
Copy link
Author

SolshineCode commented Oct 21, 2024

Confirming, it only works currently for llama 3 (and mistral), not llama 3.2?

@shamanez
Copy link
Member

Adding new models is super easy and a two-minute thing. @thomasgauthier, maybe we can add a description to the README.

@SolshineCode In the meantime, feel free to do a PR :) . Thanks again for your valuable feedback,

@SolshineCode
Copy link
Author

SolshineCode commented Oct 21, 2024

It would seem a two-minute thing but I'm puzzled why it wouldn't just work with Llama 3, since the llama 3.2 1B model_type is labelled as "llama" in the config on the hf hub, and the merge.py line you linked already accounts for llama model_type. Yet my notebook exhibits the same issue with failure to properly save the merged files with the llama 3.2 1B architecture (noted and screenshotted above.)

Image

@SolshineCode
Copy link
Author

SolshineCode commented Oct 21, 2024

As can be seen here again with Llama 3.2 1B (this time three models listed instead of two):

!python dam/merge.py \
  "meta-llama/Llama-3.2-1B" \
  "meta-llama/Llama-3.2-1B" "meta-llama/Llama-3.2-1B-Instruct" "unsloth/Llama-3.2-1B-Instruct" \
  --output_path "/content/merged_model" \
  --device "cuda" \
  --repo_id "Solshine/llama-3-2-1B-DAM-test-untrained-merge"

Results shown in this picture are same as quoted above:

Image

@SolshineCode
Copy link
Author

I submitted a PR to fix the method used in merge.py to save_model

#44

@SolshineCode
Copy link
Author

SolshineCode commented Oct 22, 2024

/ EDIT: I now believe what's covered in this comment is a seperate issue I'll probably open seperately later. /

The method (save_model withs safetensors) in my PR worked for that code chunk for the newer architecture of llama 3.2 but failed for the original llama 8B, so I changed to just turning off safetensors (using save_pretrained) which works with both llama versions for that code chunk (executing merge.py,) however I'm then running into another error when trying to train the merged model which I'm not sure if is caused by this change or something else.
It reads out for python dam/train_dam.py:

2024-10-21 20:39:25.652250: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-10-21 20:39:27.148897: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
Loading model from /content/merged_model with cache dir /content/cache
Traceback (most recent call last):
  File "/content/DAM/dam/train_dam.py", line 158, in <module>
    main()
  File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1078, in main
    rv = self.invoke(ctx)
  File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
  File "/content/DAM/dam/train_dam.py", line 59, in main
    model = prepare_model(untrained_merged_model_name, cache_dir=cache_dir)
  File "/content/DAM/dam/model_preparation.py", line 57, in prepare_model
    merged_model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.bfloat16, device_map="auto", cache_dir=cache_dir)
  File "/usr/local/lib/python3.10/dist-packages/transformers/models/auto/auto_factory.py", line 564, in from_pretrained
    return model_class.from_pretrained(
  File "/usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py", line 3905, in from_pretrained
    model.tie_weights()
  File "/usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py", line 1832, in tie_weights
    self._tie_or_clone_weights(output_embeddings, self.get_input_embeddings())
  File "/usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py", line 1944, in _tie_or_clone_weights
    output_embeddings.weight = input_embeddings.weight
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1729, in __getattr__
    raise AttributeError(f"'{type(self).__name__}' object has no attribute '{name}'")
AttributeError: 'DAMEmbeddingLayer' object has no attribute 'weight'

I will continue to explore this and maybe others on the project can provide insights as well. Thank you!!

@SolshineCode
Copy link
Author

SolshineCode commented Oct 22, 2024

I now believe these are seperate issues, both arrising from differences with the Llama 3.2 architecture vs the original Llama Architecture.
I've closed the first PR and have submitted this one instead for this issue.
#46
Later, after more exploration, I will submit an issue seperately for the issue I noted in my most recent comment here above.
Thanks!

@SolshineCode
Copy link
Author

SolshineCode commented Oct 29, 2024

The above noted error from dam/train_dam.py stemmed from my switch away from safetensors to pickl files, so I closed my other PR since it would have caused this downstream issues, and am re-investigating how to make this repo compatible with SOTA models such as Llama 3.2. Any input on this issue is warmly welcomed. Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants