RuntimeError During Merging Process Possibly Due to Shared Memory Tensors #41

SolshineCode · 2024-10-20T22:52:24Z

Description:
I'm encountering an error while trying to merge models using the merge.py script. The process loads the models and processes the layers correctly, but when it attempts to save the merged model, a RuntimeError is raised due to tensors sharing memory. Here's the detailed log:

This issue regarding when running following command on notebook

!python dam/merge.py \
  "cerebras/Cerebras-GPT-111M" \
  "cerebras/Cerebras-GPT-111M" "Corianas/111m" \
  --output_path "/content/merged_model" \
  --device "cuda" \
  --repo_id "Solshine/Cerebras-GPT-111M-DAM-test-untrained-merge"

Log Output:

Loading base model: cerebras/Cerebras-GPT-111M
Loading models to merge:
Loading models: 100% 2/2 [00:01<00:00,  1.25it/s]
/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_base.py:1601: FutureWarning: `clean_up_tokenization_spaces` was not set. It will be set to `True` by default. This behavior will be depracted in transformers v4.45, and will be then set to `False` by default. For more details check this issue: https://github.com/huggingface/transformers/issues/31884
  warnings.warn(
Processing layer norms: 0it [00:00, ?it/s]
Processing embedding layers: 100% 2/2 [00:00<00:00, 4029.11it/s]
Processing linear layers: 100% 1/1 [00:00<00:00,  3.68it/s]
Total number of parameters: 226845696
Total number of trainable parameters: 72454656
Saving merged model to /content/merged_model
Traceback (most recent call last):
  File "/content/DAM/DAM/dam/merge.py", line 267, in <module>
    main()
  File "/content/DAM/DAM/dam/merge.py", line 252, in main
    merge_models(args.base_model_id, 
  File "/content/DAM/DAM/dam/merge.py", line 215, in merge_models
    merged_model.save_pretrained(output_path)
  File "/usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py", line 2793, in save_pretrained
    safe_save_file(shard, os.path.join(save_directory, shard_file), metadata={"format": "pt"})
  File "/usr/local/lib/python3.10/dist-packages/safetensors/torch.py", line 286, in save_file
    serialize_file(_flatten(tensors), filename, metadata=metadata)
  File "/usr/local/lib/python3.10/dist-packages/safetensors/torch.py", line 488, in _flatten
    raise RuntimeError(
RuntimeError: 
            Some tensors share memory, this will lead to duplicate memory on disk and potential differences when loading them again: [{'transformer.wte.embeddings.0', 'lm_head.weights.0'}, {'transformer.wte.embeddings.1', 'lm_head.weights.1'}].
            A potential way to correctly save your model is to use `save_model`.
            More information at https://huggingface.co/docs/safetensors/torch_shared_tensors

Reproduction Steps:

Run merge.py script with the following parameters:

!python dam/merge.py \
  "cerebras/Cerebras-GPT-111M" \
  "cerebras/Cerebras-GPT-111M" "Corianas/111m" \
  --output_path "/content/merged_model" \
  --device "cuda" \
  --repo_id "Solshine/DAM-test-untrained-merge"

The error occurs during the save_pretrained() call when the merged model is being saved.

Expected Behavior:
The merged model should save correctly without errors.

Actual Behavior:
The process fails during the save step due to the model having tensors that share memory. The error suggests using save_model to handle shared tensors more appropriately.

Troubleshooting Attempts:

I tried reloading the model and saving it using PyTorch's torch.save() instead of safetensors, which worked for saving but doesn’t resolve the root issue with merge.py.
I tried examining the contents of the merged_model folder (which is created when the command runs) and found only the config and the generation_config jsons.
It seems based on the logs that the lm_head.weights and transformer.wte.embeddings may be the shared tensors causing the problem.
Different models attempted with same / similar issue ran into (llama 3.2 1B)

Request for Help:

Can you provide a walkthrough of how to handle this error within the repository's framework?
Should we modify the merge process to avoid shared tensors, or is there an alternative save method that can handle this correctly?

Any guidance or suggestions to resolve this issue would be greatly appreciated!

Thank you for your time and help!

The text was updated successfully, but these errors were encountered:

SolshineCode · 2024-10-20T23:48:39Z

This also happens with Qwen/Qwen2.5-0.5B

Loading base model: Qwen/Qwen2.5-0.5B
config.json: 100% 681/681 [00:00<00:00, 4.94MB/s]
model.safetensors: 100% 988M/988M [00:06<00:00, 146MB/s]
generation_config.json: 100% 138/138 [00:00<00:00, 1.02MB/s]
Loading models to merge:
Loading models:  50% 1/2 [00:01<00:01,  1.27s/it]
config.json: 100% 729/729 [00:00<00:00, 4.58MB/s]

model.safetensors:   0% 0.00/988M [00:00<?, ?B/s]
model.safetensors:   1% 10.5M/988M [00:00<00:12, 78.9MB/s]
model.safetensors:   3% 31.5M/988M [00:00<00:07, 128MB/s] 
model.safetensors:   6% 62.9M/988M [00:00<00:05, 181MB/s]
model.safetensors:   8% 83.9M/988M [00:00<00:07, 117MB/s]
model.safetensors:  13% 126M/988M [00:00<00:05, 168MB/s] 
model.safetensors:  15% 147M/988M [00:00<00:05, 168MB/s]
model.safetensors:  18% 178M/988M [00:01<00:04, 188MB/s]
model.safetensors:  21% 210M/988M [00:01<00:03, 211MB/s]
model.safetensors:  24% 241M/988M [00:01<00:03, 226MB/s]
model.safetensors:  28% 273M/988M [00:01<00:03, 235MB/s]
model.safetensors:  31% 304M/988M [00:01<00:02, 238MB/s]
model.safetensors:  34% 336M/988M [00:01<00:02, 241MB/s]
model.safetensors:  37% 367M/988M [00:01<00:02, 243MB/s]
model.safetensors:  40% 398M/988M [00:01<00:02, 243MB/s]
model.safetensors:  44% 430M/988M [00:02<00:02, 242MB/s]
model.safetensors:  47% 461M/988M [00:02<00:02, 239MB/s]
model.safetensors:  50% 493M/988M [00:02<00:02, 243MB/s]
model.safetensors:  53% 524M/988M [00:02<00:01, 244MB/s]
model.safetensors:  56% 556M/988M [00:02<00:01, 233MB/s]
model.safetensors:  59% 587M/988M [00:02<00:01, 247MB/s]
model.safetensors:  63% 619M/988M [00:02<00:01, 240MB/s]
model.safetensors:  66% 650M/988M [00:02<00:01, 240MB/s]
model.safetensors:  69% 682M/988M [00:03<00:01, 250MB/s]
model.safetensors:  72% 713M/988M [00:03<00:01, 247MB/s]
model.safetensors:  75% 744M/988M [00:03<00:01, 240MB/s]
model.safetensors:  79% 776M/988M [00:03<00:00, 246MB/s]
model.safetensors:  82% 807M/988M [00:03<00:00, 243MB/s]
model.safetensors:  85% 839M/988M [00:03<00:00, 248MB/s]
model.safetensors:  88% 870M/988M [00:03<00:00, 249MB/s]
model.safetensors:  91% 902M/988M [00:04<00:00, 245MB/s]
model.safetensors:  94% 933M/988M [00:04<00:00, 245MB/s]
model.safetensors: 100% 988M/988M [00:04<00:00, 226MB/s]

generation_config.json: 100% 117/117 [00:00<00:00, 710kB/s]
Loading models: 100% 2/2 [00:07<00:00,  3.81s/it]
tokenizer_config.json: 100% 7.23k/7.23k [00:00<00:00, 38.7MB/s]
vocab.json: 100% 2.78M/2.78M [00:00<00:00, 10.6MB/s]
merges.txt: 100% 1.67M/1.67M [00:00<00:00, 23.2MB/s]
tokenizer.json: 100% 7.03M/7.03M [00:00<00:00, 19.3MB/s]
Processing layer norms: 0it [00:00, ?it/s]
Processing embedding layers: 100% 2/2 [00:00<00:00, 19784.45it/s]
Processing linear layers: 100% 169/169 [00:01<00:00, 121.07it/s]
Total number of parameters: 1260786192
Total number of trainable parameters: 537360
Saving merged model to /content/merged_model
Traceback (most recent call last):
  File "/content/DAM/dam/merge.py", line 267, in <module>
    main()
  File "/content/DAM/dam/merge.py", line 252, in main
    merge_models(args.base_model_id, 
  File "/content/DAM/dam/merge.py", line 215, in merge_models
    merged_model.save_pretrained(output_path)
  File "/usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py", line 2793, in save_pretrained
    safe_save_file(shard, os.path.join(save_directory, shard_file), metadata={"format": "pt"})
  File "/usr/local/lib/python3.10/dist-packages/safetensors/torch.py", line 286, in save_file
    serialize_file(_flatten(tensors), filename, metadata=metadata)
  File "/usr/local/lib/python3.10/dist-packages/safetensors/torch.py", line 488, in _flatten
    raise RuntimeError(
RuntimeError: 
            Some tensors share memory, this will lead to duplicate memory on disk and potential differences when loading them again: [{'lm_head.weights.0', 'model.embed_tokens.embeddings.0'}, {'model.embed_tokens.embeddings.1', 'lm_head.weights.1'}].
            A potential way to correctly save your model is to use `save_model`.
            More information at https://huggingface.co/docs/safetensors/torch_shared_tensors

shamanez · 2024-10-21T02:52:24Z

Hi @SolshineCode, thanks for trying our work. I think the problem is with the modeling files. As you can see, we support Mistral and Llama3 at the moment. But adding support to other models is straightforward.

@thomasgauthier could you please take a look at this further?

SolshineCode · 2024-10-21T03:33:28Z

@shamanez I really appreciate your quick reply. This is an awesome program and I'm excited to use it further.

I've replicated the error and issue with the meta-llama/Llama-3.2-1B-Instruct model, so I believe this issue also occurs with Llama Architecture.

Error Log Snippit:

...
model.safetensors: 100% 2.47G/2.47G [00:30<00:00, 80.2MB/s]

generation_config.json: 100% 234/234 [00:00<00:00, 1.70MB/s]
Loading models: 100% 2/2 [00:37<00:00, 18.90s/it]
tokenizer_config.json: 100% 54.5k/54.5k [00:00<00:00, 809kB/s]
tokenizer.json: 100% 9.09M/9.09M [00:00<00:00, 20.7MB/s]
special_tokens_map.json: 100% 296/296 [00:00<00:00, 1.83MB/s]
Processing layer norms: 100% 33/33 [00:00<00:00, 788.86it/s]
Processing embedding layers: 100% 2/2 [00:00<00:00, 19737.90it/s]
Processing linear layers: 100% 113/113 [00:04<00:00, 27.47it/s]
Total number of parameters: 2997764096
Total number of trainable parameters: 659456
Saving merged model to /content/merged_model
Traceback (most recent call last):
  File "/content/DAM/dam/merge.py", line 267, in <module>
    main()
  File "/content/DAM/dam/merge.py", line 252, in main
    merge_models(args.base_model_id, 
  File "/content/DAM/dam/merge.py", line 215, in merge_models
    merged_model.save_pretrained(output_path)
  File "/usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py", line 2793, in save_pretrained
    safe_save_file(shard, os.path.join(save_directory, shard_file), metadata={"format": "pt"})
  File "/usr/local/lib/python3.10/dist-packages/safetensors/torch.py", line 286, in save_file
    serialize_file(_flatten(tensors), filename, metadata=metadata)
  File "/usr/local/lib/python3.10/dist-packages/safetensors/torch.py", line 488, in _flatten
    raise RuntimeError(
RuntimeError: 
            Some tensors share memory, this will lead to duplicate memory on disk and potential differences when loading them again: [{'lm_head.weights.0', 'model.embed_tokens.embeddings.0'}, {'model.embed_tokens.embeddings.1', 'lm_head.weights.1'}].
            A potential way to correctly save your model is to use `save_model`.
            More information at https://huggingface.co/docs/safetensors/torch_shared_tensors

And the saved merged_model folder again only contains the two config jsons.

shamanez · 2024-10-21T04:20:14Z

Can you please try this command?

python dam/merge.py mistralai/Mistral-7B-v0.1 augmxnt/shisa-gamma-7b-v1 WizardLM/WizardMath-7B-V1.1 arcee-train/Abel-7B-002-truncated-embeds --device cuda --output_path ./merged_model --repo_id arcee-train/[prefix]-untrained-merge

**Also wanted this merge operation to happen on CPUs. I think there's a little bug. But can you please remove the "cuda" command as well?

Because this operation doesn't needa GPU.**

Then adding to this, the logic behind adding a new model is here - https://github.com/arcee-ai/DAM/blob/main/dam/merge.py#L21

SolshineCode · 2024-10-21T05:37:53Z

I tried that in this context but it seems that's too large an operation for the Google Colab notebook I'm using, so I'll have to check it out when I'm back at a real PC.

I was hoping to make and release a notebook that works to do DAM for tiny models on the colab free t4 alotment. It quits out at "pytorch_model-00001-of-00002.bin: 49% 4.83G/9.94G [03:18<03:28, 24.5MB/s]" of the base model download.

This DAM project is incredibly cool and reinvigorates my faith in a distributed polysemanticity interpretations for neural network interpretability. Great work!

I'll take a look at the logic behind adding a new model and may make a PR for the newer llama models if I can figure it out. It would be good to be able to use this on SOTA tiny LLM architectures. Thanks!

SolshineCode · 2024-10-21T05:39:33Z

Confirming, it only works currently for llama 3 (and mistral), not llama 3.2?

shamanez · 2024-10-21T06:20:22Z

Adding new models is super easy and a two-minute thing. @thomasgauthier, maybe we can add a description to the README.

@SolshineCode In the meantime, feel free to do a PR :) . Thanks again for your valuable feedback,

SolshineCode · 2024-10-21T07:28:35Z

It would seem a two-minute thing but I'm puzzled why it wouldn't just work with Llama 3, since the llama 3.2 1B model_type is labelled as "llama" in the config on the hf hub, and the merge.py line you linked already accounts for llama model_type. Yet my notebook exhibits the same issue with failure to properly save the merged files with the llama 3.2 1B architecture (noted and screenshotted above.)

SolshineCode · 2024-10-21T08:04:06Z

As can be seen here again with Llama 3.2 1B (this time three models listed instead of two):

!python dam/merge.py \
  "meta-llama/Llama-3.2-1B" \
  "meta-llama/Llama-3.2-1B" "meta-llama/Llama-3.2-1B-Instruct" "unsloth/Llama-3.2-1B-Instruct" \
  --output_path "/content/merged_model" \
  --device "cuda" \
  --repo_id "Solshine/llama-3-2-1B-DAM-test-untrained-merge"

Results shown in this picture are same as quoted above:

SolshineCode · 2024-10-21T09:12:28Z

I submitted a PR to fix the method used in merge.py to save_model

#44

SolshineCode · 2024-10-22T09:06:04Z

/ EDIT: I now believe what's covered in this comment is a seperate issue I'll probably open seperately later. /

The method (save_model withs safetensors) in my PR worked for that code chunk for the newer architecture of llama 3.2 but failed for the original llama 8B, so I changed to just turning off safetensors (using save_pretrained) which works with both llama versions for that code chunk (executing merge.py,) however I'm then running into another error when trying to train the merged model which I'm not sure if is caused by this change or something else.
It reads out for python dam/train_dam.py:

2024-10-21 20:39:25.652250: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-10-21 20:39:27.148897: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
Loading model from /content/merged_model with cache dir /content/cache
Traceback (most recent call last):
  File "/content/DAM/dam/train_dam.py", line 158, in <module>
    main()
  File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1078, in main
    rv = self.invoke(ctx)
  File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
  File "/content/DAM/dam/train_dam.py", line 59, in main
    model = prepare_model(untrained_merged_model_name, cache_dir=cache_dir)
  File "/content/DAM/dam/model_preparation.py", line 57, in prepare_model
    merged_model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.bfloat16, device_map="auto", cache_dir=cache_dir)
  File "/usr/local/lib/python3.10/dist-packages/transformers/models/auto/auto_factory.py", line 564, in from_pretrained
    return model_class.from_pretrained(
  File "/usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py", line 3905, in from_pretrained
    model.tie_weights()
  File "/usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py", line 1832, in tie_weights
    self._tie_or_clone_weights(output_embeddings, self.get_input_embeddings())
  File "/usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py", line 1944, in _tie_or_clone_weights
    output_embeddings.weight = input_embeddings.weight
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1729, in __getattr__
    raise AttributeError(f"'{type(self).__name__}' object has no attribute '{name}'")
AttributeError: 'DAMEmbeddingLayer' object has no attribute 'weight'

I will continue to explore this and maybe others on the project can provide insights as well. Thank you!!

SolshineCode · 2024-10-22T21:55:02Z

I now believe these are seperate issues, both arrising from differences with the Llama 3.2 architecture vs the original Llama Architecture.
I've closed the first PR and have submitted this one instead for this issue.
#46
Later, after more exploration, I will submit an issue seperately for the issue I noted in my most recent comment here above.
Thanks!

SolshineCode · 2024-10-29T01:59:49Z

The above noted error from dam/train_dam.py stemmed from my switch away from safetensors to pickl files, so I closed my other PR since it would have caused this downstream issues, and am re-investigating how to make this repo compatible with SOTA models such as Llama 3.2. Any input on this issue is warmly welcomed. Thanks

SolshineCode changed the title ~~RuntimeError During Merging Process Due to Shared Memory Tensors~~ RuntimeError During Merging Process Possibly Due to Shared Memory Tensors Oct 20, 2024

SolshineCode mentioned this issue Oct 21, 2024

Changed save_pretrained to account for safetensors error #44

Closed

SolshineCode mentioned this issue Oct 22, 2024

safetensors off for saving merged llama models #46

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RuntimeError During Merging Process Possibly Due to Shared Memory Tensors #41

RuntimeError During Merging Process Possibly Due to Shared Memory Tensors #41

SolshineCode commented Oct 20, 2024

SolshineCode commented Oct 20, 2024

shamanez commented Oct 21, 2024

SolshineCode commented Oct 21, 2024

shamanez commented Oct 21, 2024 •

edited

Loading

SolshineCode commented Oct 21, 2024 •

edited

Loading

SolshineCode commented Oct 21, 2024 •

edited

Loading

shamanez commented Oct 21, 2024

SolshineCode commented Oct 21, 2024 •

edited

Loading

SolshineCode commented Oct 21, 2024 •

edited

Loading

SolshineCode commented Oct 21, 2024

SolshineCode commented Oct 22, 2024 •

edited

Loading

SolshineCode commented Oct 22, 2024 •

edited

Loading

SolshineCode commented Oct 29, 2024 •

edited

Loading

RuntimeError During Merging Process Possibly Due to Shared Memory Tensors #41

RuntimeError During Merging Process Possibly Due to Shared Memory Tensors #41

Comments

SolshineCode commented Oct 20, 2024

SolshineCode commented Oct 20, 2024

shamanez commented Oct 21, 2024

SolshineCode commented Oct 21, 2024

shamanez commented Oct 21, 2024 • edited Loading

SolshineCode commented Oct 21, 2024 • edited Loading

SolshineCode commented Oct 21, 2024 • edited Loading

shamanez commented Oct 21, 2024

SolshineCode commented Oct 21, 2024 • edited Loading

SolshineCode commented Oct 21, 2024 • edited Loading

SolshineCode commented Oct 21, 2024

SolshineCode commented Oct 22, 2024 • edited Loading

SolshineCode commented Oct 22, 2024 • edited Loading

SolshineCode commented Oct 29, 2024 • edited Loading

shamanez commented Oct 21, 2024 •

edited

Loading

SolshineCode commented Oct 21, 2024 •

edited

Loading

SolshineCode commented Oct 21, 2024 •

edited

Loading

SolshineCode commented Oct 21, 2024 •

edited

Loading

SolshineCode commented Oct 21, 2024 •

edited

Loading

SolshineCode commented Oct 22, 2024 •

edited

Loading

SolshineCode commented Oct 22, 2024 •

edited

Loading

SolshineCode commented Oct 29, 2024 •

edited

Loading