tests/test_cli.py::ModelEstimatorTester::test_no_split_modules fails after 84a67891 in Transformers #3362

dvrogozh · 2025-01-23T22:50:44Z

With:

(accelerate) 675e35b
huggingface/transformers@2c3a44f
pytorch/pytorch@b1c2c39 (pytorch 2.7 candidate, note: torch version is likely not important)

On:

A10 Nvidia
or Intel Data Center Max (PVC)

test_no_split_modules fails :

$ python3 -m pytest tests/test_cli.py::ModelEstimatorTester::test_no_split_modules
======================================= test session starts =======================================
platform linux -- Python 3.10.12, pytest-7.4.4, pluggy-1.5.0
rootdir: /home/dvrogozh/git/huggingface/accelerate
plugins: rich-0.2.0, hypothesis-6.123.13, xdist-3.6.1, asyncio-0.23.8, timeout-2.3.1
asyncio: mode=strict
collected 1 item

tests/test_cli.py F                                                                         [100%]

============================================ FAILURES =============================================
___________________________ ModelEstimatorTester.test_no_split_modules ____________________________

self = <test_cli.ModelEstimatorTester testMethod=test_no_split_modules>

    @require_transformers
    def test_no_split_modules(self):
        # idefics-80b-instruct has ["IdeficsDecoderLayer", "IdeficsGatedCrossAttentionLayer"]
        args = self.parser.parse_args(["HuggingFaceM4/idefics-80b-instruct", "--dtypes", "float32"])
        output = gather_data(args)
        # without factoring in `no_split` modules, the largest layer is 721420288 bytes
        assert output[0][1] != 721420288, "Largest layer calculation incorrect, did not factor in `no_split` modules."
        # the real answer is 3240165632 bytes
>       assert output[0][1] == 3240165632
E       assert 1620082944 == 3240165632

tests/test_cli.py:502: AssertionError
-------------------------------------- Captured stdout call ---------------------------------------
Loading pretrained config for `HuggingFaceM4/idefics-80b-instruct` from `transformers`...
===================================== short test summary info =====================================
FAILED tests/test_cli.py::ModelEstimatorTester::test_no_split_modules - assert 1620082944 == 3240165632
======================================== 1 failed in 5.88s ========================================

This issue appeared after this change in Transformers by @zucchini-nlp:

huggingface/transformers@84a6789:

commit 84a6789145c3d728f2e405d31e9a35df5d74f05c
Author: Raushan Turganbay <[email protected]>
Date:   Mon Jan 13 13:42:08 2025 +0100

    Enable different torch dtype in sub models (#34873)

Here is a bisect (good revision 241c04d368 is actually v4.47.1 of Transformers):

$ git bisect log
git bisect start
# bad: [2c3a44f9a769e98597d62ecdc7383785318be5a2] Fix NoneType type as it requires py>=3.10 (#35843
git bisect bad 2c3a44f9a769e98597d62ecdc7383785318be5a2
# good: [241c04d36867259cdf11dbb4e9d9a60f9cb65ebc] [Whisper] patch float type on mps (#35295)
git bisect good 241c04d36867259cdf11dbb4e9d9a60f9cb65ebc
# good: [a5bb52847139bf6ad7489ac62a5fb6d0fa3d2ec6] Fix signatures for processing kwargs (#35105)
git bisect good a5bb52847139bf6ad7489ac62a5fb6d0fa3d2ec6
# good: [ed73ae210b87a86de5467d93bdbd4b5834f4f00c] NPU support SDPA (#35165)
git bisect good ed73ae210b87a86de5467d93bdbd4b5834f4f00c
# bad: [b0cdbd911941bdd596ac753fd33849ef83b1a841] Enhanced Installation Section in README.md (#3509
git bisect bad b0cdbd911941bdd596ac753fd33849ef83b1a841
# good: [633da1b10e61a3727ab2d4b0e186191b2590d092] [Idefics3] Move image features to same device as
git bisect good 633da1b10e61a3727ab2d4b0e186191b2590d092
# good: [bbc00046b9b746d5ea69a27a58e652fdf617c91c] Fix flaky `test_custom_4d_attention_mask` (#3560
git bisect good bbc00046b9b746d5ea69a27a58e652fdf617c91c
# bad: [84a6789145c3d728f2e405d31e9a35df5d74f05c] Enable different torch dtype in sub models (#3487
git bisect bad 84a6789145c3d728f2e405d31e9a35df5d74f05c
# good: [1e3c6c1f7d7fbb4862ff036a9bdf96e6ae2e2511] Skip `MobileNetV1ModelTest::test_batching_equiva
git bisect good 1e3c6c1f7d7fbb4862ff036a9bdf96e6ae2e2511
# good: [b8c34d97fcf6685a480e974f9c88bf04cb3088a9] Fix whisper compile (#35413)
git bisect good b8c34d97fcf6685a480e974f9c88bf04cb3088a9
# good: [87089176d9a205d69e7d4e152fc25950d77580d2] [`Phi`] bias should be True (#35650)
git bisect good 87089176d9a205d69e7d4e152fc25950d77580d2
# first bad commit: [84a6789145c3d728f2e405d31e9a35df5d74f05c] Enable different torch dtype in sub

From the change, I guess that the accelerator test got affected by the changed size in one of the layers for which the test is sensitive. If so, that's unlikely an issue with the change in Transformers, but likely Accelerator test must be adjusted in respect to the Transformers fix.

CC: @zucchini-nlp @ydshieh

The text was updated successfully, but these errors were encountered:

zucchini-nlp · 2025-01-24T08:13:01Z

Might be also caused by transformers, currently when loading composite models like VLMs with from_pretrained(torch_dtype=torch.float32) the model is loaded in auto-dtype. Only string dtypes work, my bad for not noticing it. The fix is in huggingface/transformers#35728 and I will merge it soon

dvrogozh · 2025-01-24T15:06:42Z

The fix is in huggingface/transformers#35728

@zucchini-nlp : I've tried this PR. Unfortunately it does not help to fix the reported issue in accelerate.

zucchini-nlp · 2025-01-24T16:59:10Z

Oke, I found the issue, it was related to the model being initialized from config instead of being loaded from a checkpoint. I'll fix it on transformers side

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tests/test_cli.py::ModelEstimatorTester::test_no_split_modules fails after 84a67891 in Transformers #3362

tests/test_cli.py::ModelEstimatorTester::test_no_split_modules fails after 84a67891 in Transformers #3362

dvrogozh commented Jan 23, 2025

zucchini-nlp commented Jan 24, 2025

dvrogozh commented Jan 24, 2025

zucchini-nlp commented Jan 24, 2025

tests/test_cli.py::ModelEstimatorTester::test_no_split_modules fails after 84a67891 in Transformers #3362

tests/test_cli.py::ModelEstimatorTester::test_no_split_modules fails after 84a67891 in Transformers #3362

Comments

dvrogozh commented Jan 23, 2025

zucchini-nlp commented Jan 24, 2025

dvrogozh commented Jan 24, 2025

zucchini-nlp commented Jan 24, 2025