Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tests/test_cli.py::ModelEstimatorTester::test_no_split_modules fails after 84a67891 in Transformers #3362

Open
dvrogozh opened this issue Jan 23, 2025 · 3 comments

Comments

@dvrogozh
Copy link
Contributor

With:

On:

  • A10 Nvidia
  • or Intel Data Center Max (PVC)

test_no_split_modules fails :

$ python3 -m pytest tests/test_cli.py::ModelEstimatorTester::test_no_split_modules
======================================= test session starts =======================================
platform linux -- Python 3.10.12, pytest-7.4.4, pluggy-1.5.0
rootdir: /home/dvrogozh/git/huggingface/accelerate
plugins: rich-0.2.0, hypothesis-6.123.13, xdist-3.6.1, asyncio-0.23.8, timeout-2.3.1
asyncio: mode=strict
collected 1 item

tests/test_cli.py F                                                                         [100%]

============================================ FAILURES =============================================
___________________________ ModelEstimatorTester.test_no_split_modules ____________________________

self = <test_cli.ModelEstimatorTester testMethod=test_no_split_modules>

    @require_transformers
    def test_no_split_modules(self):
        # idefics-80b-instruct has ["IdeficsDecoderLayer", "IdeficsGatedCrossAttentionLayer"]
        args = self.parser.parse_args(["HuggingFaceM4/idefics-80b-instruct", "--dtypes", "float32"])
        output = gather_data(args)
        # without factoring in `no_split` modules, the largest layer is 721420288 bytes
        assert output[0][1] != 721420288, "Largest layer calculation incorrect, did not factor in `no_split` modules."
        # the real answer is 3240165632 bytes
>       assert output[0][1] == 3240165632
E       assert 1620082944 == 3240165632

tests/test_cli.py:502: AssertionError
-------------------------------------- Captured stdout call ---------------------------------------
Loading pretrained config for `HuggingFaceM4/idefics-80b-instruct` from `transformers`...
===================================== short test summary info =====================================
FAILED tests/test_cli.py::ModelEstimatorTester::test_no_split_modules - assert 1620082944 == 3240165632
======================================== 1 failed in 5.88s ========================================

This issue appeared after this change in Transformers by @zucchini-nlp:

commit 84a6789145c3d728f2e405d31e9a35df5d74f05c
Author: Raushan Turganbay <[email protected]>
Date:   Mon Jan 13 13:42:08 2025 +0100

    Enable different torch dtype in sub models (#34873)

Here is a bisect (good revision 241c04d368 is actually v4.47.1 of Transformers):

$ git bisect log
git bisect start
# bad: [2c3a44f9a769e98597d62ecdc7383785318be5a2] Fix NoneType type as it requires py>=3.10 (#35843
git bisect bad 2c3a44f9a769e98597d62ecdc7383785318be5a2
# good: [241c04d36867259cdf11dbb4e9d9a60f9cb65ebc] [Whisper] patch float type on mps (#35295)
git bisect good 241c04d36867259cdf11dbb4e9d9a60f9cb65ebc
# good: [a5bb52847139bf6ad7489ac62a5fb6d0fa3d2ec6] Fix signatures for processing kwargs (#35105)
git bisect good a5bb52847139bf6ad7489ac62a5fb6d0fa3d2ec6
# good: [ed73ae210b87a86de5467d93bdbd4b5834f4f00c] NPU support SDPA (#35165)
git bisect good ed73ae210b87a86de5467d93bdbd4b5834f4f00c
# bad: [b0cdbd911941bdd596ac753fd33849ef83b1a841] Enhanced Installation Section in README.md (#3509
git bisect bad b0cdbd911941bdd596ac753fd33849ef83b1a841
# good: [633da1b10e61a3727ab2d4b0e186191b2590d092] [Idefics3] Move image features to same device as
git bisect good 633da1b10e61a3727ab2d4b0e186191b2590d092
# good: [bbc00046b9b746d5ea69a27a58e652fdf617c91c] Fix flaky `test_custom_4d_attention_mask` (#3560
git bisect good bbc00046b9b746d5ea69a27a58e652fdf617c91c
# bad: [84a6789145c3d728f2e405d31e9a35df5d74f05c] Enable different torch dtype in sub models (#3487
git bisect bad 84a6789145c3d728f2e405d31e9a35df5d74f05c
# good: [1e3c6c1f7d7fbb4862ff036a9bdf96e6ae2e2511] Skip `MobileNetV1ModelTest::test_batching_equiva
git bisect good 1e3c6c1f7d7fbb4862ff036a9bdf96e6ae2e2511
# good: [b8c34d97fcf6685a480e974f9c88bf04cb3088a9] Fix whisper compile (#35413)
git bisect good b8c34d97fcf6685a480e974f9c88bf04cb3088a9
# good: [87089176d9a205d69e7d4e152fc25950d77580d2] [`Phi`] bias should be True (#35650)
git bisect good 87089176d9a205d69e7d4e152fc25950d77580d2
# first bad commit: [84a6789145c3d728f2e405d31e9a35df5d74f05c] Enable different torch dtype in sub

From the change, I guess that the accelerator test got affected by the changed size in one of the layers for which the test is sensitive. If so, that's unlikely an issue with the change in Transformers, but likely Accelerator test must be adjusted in respect to the Transformers fix.

CC: @zucchini-nlp @ydshieh

@zucchini-nlp
Copy link
Member

Might be also caused by transformers, currently when loading composite models like VLMs with from_pretrained(torch_dtype=torch.float32) the model is loaded in auto-dtype. Only string dtypes work, my bad for not noticing it. The fix is in huggingface/transformers#35728 and I will merge it soon

@dvrogozh
Copy link
Contributor Author

The fix is in huggingface/transformers#35728

@zucchini-nlp : I've tried this PR. Unfortunately it does not help to fix the reported issue in accelerate.

@zucchini-nlp
Copy link
Member

Oke, I found the issue, it was related to the model being initialized from config instead of being loaded from a checkpoint. I'll fix it on transformers side

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants