Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Reland] ROCm CI (Infra + Skips) #1581

Open
wants to merge 19 commits into
base: main
Choose a base branch
from
Open

Conversation

petrex
Copy link
Collaborator

@petrex petrex commented Jan 17, 2025

This PR to address the import error in CI + infra changes to enable ROCm CI.

This pull request introduces the skip_if_rocm decorator across various test files to skip tests that are not yet supported on ROCm. The changes ensure that tests are conditionally skipped if ROCm is detected, improving the test suite's compatibility with different environments.

Key changes include:

Cherry-pick ROCm CI infra changes from #999

Introduction of skip_if_rocm decorator:

  • Added skip_if_rocm import in multiple test files to conditionally skip tests not supported on ROCm. (test/dtypes/test_affine_quantized.py, test/dtypes/test_floatx.py, test/float8/test_base.py, test/hqq/test_hqq_affine.py, test/integration/test_integration.py, test/kernel/test_galore_downproj.py, test/prototype/test_awq.py, test/prototype/test_low_bit_optim.py, test/prototype/test_splitk.py, test/quantization/test_galore_quant.py, test/quantization/test_marlin_qqq.py, test/sparsity/test_marlin.py, test/test_ops.py, test/test_s8s4_linear_cutlass.py, torchao/utils.py) [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15]

Application of skip_if_rocm decorator:

  • Applied @skip_if_rocm("ROCm development in progress") to multiple test functions to skip them when running on ROCm. (test/dtypes/test_affine_quantized.py, test/dtypes/test_floatx.py, test/float8/test_base.py, test/hqq/test_hqq_affine.py, test/integration/test_integration.py, test/kernel/test_galore_downproj.py, test/prototype/test_awq.py, test/prototype/test_low_bit_optim.py, test/prototype/test_splitk.py, test/quantization/test_galore_quant.py, test/quantization/test_marlin_qqq.py, test/sparsity/test_marlin.py) [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20]

Module-level skips for ROCm:

  • Added module-level skips for ROCm in specific test files to skip all tests within the module if ROCm is detected. (test/test_ops.py, test/test_s8s4_linear_cutlass.py) [1] [2]

Copy link

pytorch-bot bot commented Jan 17, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/1581

Note: Links to docs will display an error until the docs builds have been completed.

❌ 3 New Failures

As of commit 89f0bc1 with merge base ea7910e (image):

NEW FAILURES - The following jobs have failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jan 17, 2025
@petrex petrex added the topic: not user facing Use this tag if you don't want this PR to show up in release notes label Jan 17, 2025
Copy link

pytorch-bot bot commented Jan 17, 2025

Warning: Unknown label ciflow/rocm.
Currently recognized labels are

  • ciflow/benchmark
  • ciflow/tutorials

Please add the new label to .github/pytorch-probot.yml

@petrex petrex self-assigned this Jan 17, 2025
@petrex petrex requested a review from jainapurva January 17, 2025 18:26
@petrex petrex requested a review from andrewor14 January 17, 2025 18:27
@andrewor14 andrewor14 changed the title Skip ROCm Tests in CI Fix imports after skipping ROCm Tests in CI Jan 17, 2025
@andrewor14 andrewor14 changed the title Fix imports after skipping ROCm Tests in CI [Reland] Skip ROCm Tests in CI Jan 17, 2025
@andrewor14
Copy link
Contributor

Thanks @petrex. Would you also like to reopen #999 or should someone else do it?

@petrex
Copy link
Collaborator Author

petrex commented Jan 17, 2025

Thanks @andrewor14 I will work with AMD team on that

@petrex
Copy link
Collaborator Author

petrex commented Jan 21, 2025

@amdfaa just cherry-pick your infra changes into this PR so we can have a clearer CI signal. Please help review the changes. thx

@petrex petrex requested a review from amdfaa January 21, 2025 22:08
@petrex petrex changed the title [Reland] Skip ROCm Tests in CI [Reland] ROCm CI (Infra + Skips) Jan 21, 2025
* Enable ROCM in CI

---------

Co-authored-by: amdfaa <[email protected]>
@petrex petrex requested a review from msaroufim January 22, 2025 19:21
Copy link
Collaborator

@amdfaa amdfaa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm from the infra side

@msaroufim
Copy link
Member

msaroufim commented Jan 22, 2025

The breakage on cuda doesns't seem related to you, seems like it's this test FAILED test/quantization/test_quant_api.py::TestQuantFlow::test_quantized_tensor_subclass_int8_dyn_quant - torch._inductor.exc.CppCompileError: C++ compile error in which case @jerryzh168 might need to take a look

@jithunnair-amd
Copy link
Collaborator

@petrex Looks like more fixes/skips are needed: https://github.com/pytorch/ao/actions/runs/12934492712/job/36084046951?pr=1581
=== 117 failed, 1493 passed, 518 skipped, 49 warnings in 3551.22s (0:59:11) ====

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ciflow/rocm CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. module: rocm topic: not user facing Use this tag if you don't want this PR to show up in release notes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants