Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SYCL] Move SYCL header E2E checker test to compile-time #15835

Merged
merged 4 commits into from
Oct 25, 2024

Conversation

sarnex
Copy link
Contributor

@sarnex sarnex commented Oct 23, 2024

We have all the info at compile time, no need to waste resources to run this on a device.

Also add an extra step in the Linux build CI to run these tests specifically if we don't run check-sycl, as sometimes we don't run it if there are not changes to SYCL.

We require all tests checking anything about E2E files be in sycl/test/e2e_test_requirements. Anything in there will get run at compile time.

Also fix a bug I exposed in the XFAIL checker test.

@sarnex sarnex marked this pull request as ready for review October 23, 2024 15:38
@sarnex sarnex requested a review from a team as a code owner October 23, 2024 15:38
@sarnex sarnex changed the title [SYCL] Move SYCL header E2E test to compile-time [SYCL] Move SYCL header E2E checker test to compile-time Oct 23, 2024
Copy link
Contributor

@KornevNikita KornevNikita left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

@aelovikov-intel aelovikov-intel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I put it in e2e because

  1. We have "standalone" e2e mode when it's separate cmake
  2. We might be skipping check-sycl for e2e-only changes
  3. I think it makes sense that the issue in test-e2e is reported in check-sycl-e2e

@sarnex
Copy link
Contributor Author

sarnex commented Oct 23, 2024

I see a few problems:

  1. We report the error much later than we can detect the problem. We could report the problem in 10 seconds but we take maybe 1 hour or more to do it.
  2. We run the same test on all devices for no reason (the test is trivial so I doubt this has a real effect though)
  3. People fighting with the test have to wait for build, CI device allocation and E2E to run again which takes forever and rerun it over and over, wasting resources. Right now we lost our PVC build machines so we have to use the arc/amdgpu ones for build, so resources are super strained.

If we update CI to not re-run build when there are E2E only changes, that would fix the problem in 3) about waiting for build but then the users still has to wait for E2E allocation + full E2E run to see the problem. We could also move this test back to E2E if we ever do get CI that does this I guess.

It's true that this would require check-sycl to be run when changing E2E tests, but once we get our good build machines back we can make check-sycl a separate task as part of the e2e-changes-only logic and it will absolutely finish before any E2E tests, and should take about a minute or two.

To be fair this test isn't a big deal and I doubt people fight with it often, so if you feel strongly I'll close this.

@sarnex
Copy link
Contributor Author

sarnex commented Oct 23, 2024

Going to push a change to address Andrei's concerns in a bit

Edit: Hitting some weird problems, working on it.

@sarnex sarnex requested a review from a team as a code owner October 24, 2024 20:10
// RUN: -A 1 --include=*.c --include=*.cpp --no-group-separator | \
// RUN: grep -v "XFAIL:" | \
// RUN: grep -Pv "XFAIL-TRACKER:\s+(?:https://github.com/[\w\d-]+/[\w\d-]+/issues/[\d]+)|(?:[\w]+-[\d]+)" > %t | \
Copy link
Contributor Author

@sarnex sarnex Oct 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it took me like 4 hours to find why this test was randomly failing, it's because we write to%t on disk and then immediately cat it after a pipe, seems sometimes we get a partially written/empty file. i don't know enough about bash to say specifically what part is async/buffered/whatever, but we can easily do something cleaner that works consistently

no need to pipe here, just use separate commands

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing that!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We used pipes before we added the output to the file, so yes we should have updated this when adding an output to the file.

Copy link
Contributor Author

@sarnex sarnex Oct 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, this one was really insane to debug, i had to make the test infinite loop and connect to one of the CI machines so i could manually run the commands, i couldnt repro locally, probably because local system was too fast writing to disc?

@sarnex
Copy link
Contributor Author

sarnex commented Oct 24, 2024

@aelovikov-intel Hopefully the latest commit addresses your feedback

@aelovikov-intel
Copy link
Contributor

@aelovikov-intel Hopefully the latest commit addresses your feedback

Yes it does, thanks! Can you update the description to summarize the change too please?

@sarnex
Copy link
Contributor Author

sarnex commented Oct 24, 2024

Done!

@sarnex sarnex merged commit e4b8420 into intel:sycl Oct 25, 2024
14 of 32 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants