-
Notifications
You must be signed in to change notification settings - Fork 572
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Kokkos + KokkosKernels Promotion To 4.3.0 #12879
Conversation
The majority of the changes were pre-tested by the AT in #12863 |
Status Flag 'Pre-Test Inspection' - Auto Inspected - Inspection is Not Necessary for this Pull Request. |
Status Flag 'Pull Request AutoTester' - Testing Jenkins Projects: Pull Request Auto Testing STARTING (click to expand)Build InformationTest Name: Trilinos_PR_gcc-8.3.0
Jenkins Parameters
Build InformationTest Name: Trilinos_PR_gcc-8.3.0-serial
Jenkins Parameters
Build InformationTest Name: Trilinos_PR_gcc-8.3.0-debug
Jenkins Parameters
Build InformationTest Name: Trilinos_PR_clang-11.0.1
Jenkins Parameters
Build InformationTest Name: Trilinos_PR_python3
Jenkins Parameters
Build InformationTest Name: Trilinos_PR_cuda-11.4.2-uvm-off
Jenkins Parameters
Build InformationTest Name: Trilinos_PR_intel-2021.3
Jenkins Parameters
Build InformationTest Name: Trilinos_PR_cuda-11.4.20-uvm
Jenkins Parameters
Using Repos:
Pull Request Author: ndellingwood |
Status Flag 'Pull Request AutoTester' - Error: Jenkins Jobs - Error: [Jenkins] Cannot retrieve build running status on build (FATAL: ERROR : [jwrap: job_instance.get_build(self.bn):105:/home/trilinos/workspace/Trilinos_autotester_driver_inst_1/autotester/support/jenkins_support.py] - General Exception: (502 Server Error: Proxy Error for url: https://do.sandia.gov/trilinos-ci/job/Trilinos_PR_gcc-8.3.0-serial/2397/api/python?depth=1)) |
Status Flag 'Pull Request AutoTester' - Jenkins Testing: 1 or more Jobs FAILED Note: Testing will normally be attempted again in approx. 2 Hrs 30 Mins. If a change to the PR source branch occurs, the testing will be attempted again on next available autotester run. Pull Request Auto Testing has FAILED (click to expand)Build InformationTest Name: Trilinos_PR_gcc-8.3.0
Jenkins Parameters
Build InformationTest Name: Trilinos_PR_gcc-8.3.0-serial
Jenkins Parameters
Build InformationTest Name: Trilinos_PR_gcc-8.3.0-debug
Jenkins Parameters
Build InformationTest Name: Trilinos_PR_clang-11.0.1
Jenkins Parameters
Build InformationTest Name: Trilinos_PR_python3
Jenkins Parameters
Build InformationTest Name: Trilinos_PR_cuda-11.4.2-uvm-off
Jenkins Parameters
Build InformationTest Name: Trilinos_PR_intel-2021.3
Jenkins Parameters
Build InformationTest Name: Trilinos_PR_cuda-11.4.20-uvm
Jenkins Parameters
|
I scanned the compilation errors and looked back at the code changes, I found a couple artifacts that showed up following the merge-to-master before snapshot (unrelated to merge conflicts). I'll fix these in their respective repo PRs and update the snapshots to this PR |
8102a90
to
f0b952d
Compare
Merge artifacts fixed in the release PRs kokkos/kokkos#6908 and kokkos/kokkos-kernels#2163, and snapshots updated here |
Status Flag 'Pre-Test Inspection' - Auto Inspected - Inspection is Not Necessary for this Pull Request. |
Status Flag 'Pull Request AutoTester' - Testing Jenkins Projects: Pull Request Auto Testing STARTING (click to expand)Build InformationTest Name: Trilinos_PR_gcc-8.3.0
Jenkins Parameters
Build InformationTest Name: Trilinos_PR_gcc-8.3.0-serial
Jenkins Parameters
Build InformationTest Name: Trilinos_PR_gcc-8.3.0-debug
Jenkins Parameters
Build InformationTest Name: Trilinos_PR_clang-11.0.1
Jenkins Parameters
Build InformationTest Name: Trilinos_PR_python3
Jenkins Parameters
Build InformationTest Name: Trilinos_PR_cuda-11.4.2-uvm-off
Jenkins Parameters
Build InformationTest Name: Trilinos_PR_intel-2021.3
Jenkins Parameters
Build InformationTest Name: Trilinos_PR_cuda-11.4.20-uvm
Jenkins Parameters
Using Repos:
Pull Request Author: ndellingwood |
Status Flag 'Pull Request AutoTester' - Jenkins Testing: 1 or more Jobs FAILED Note: Testing will normally be attempted again in approx. 2 Hrs 30 Mins. If a change to the PR source branch occurs, the testing will be attempted again on next available autotester run. Pull Request Auto Testing has FAILED (click to expand)Build InformationTest Name: Trilinos_PR_gcc-8.3.0
Jenkins Parameters
Build InformationTest Name: Trilinos_PR_gcc-8.3.0-serial
Jenkins Parameters
Build InformationTest Name: Trilinos_PR_gcc-8.3.0-debug
Jenkins Parameters
Build InformationTest Name: Trilinos_PR_clang-11.0.1
Jenkins Parameters
Build InformationTest Name: Trilinos_PR_python3
Jenkins Parameters
Build InformationTest Name: Trilinos_PR_cuda-11.4.2-uvm-off
Jenkins Parameters
Build InformationTest Name: Trilinos_PR_intel-2021.3
Jenkins Parameters
Build InformationTest Name: Trilinos_PR_cuda-11.4.20-uvm
Jenkins Parameters
|
CDash failure summary: The clang/11 job https://trilinos-cdash.sandia.gov/viewBuildError.php?buildid=1454586 failed with reports of:
@sebrowne @trilinos/framework is there a label available to clean the ccache? If not, can you give guidance? The intel/2021.3 and gnu/8.3.0 jobs reported two panzer test failures:
I didn't see those failures with with the test PR #12863 , this is a list of the changes since that PR: Kokkos: previous test PR sha: kokkos/kokkos@772e745
KokkosKernels: previous test PR sha kokkos/kokkos-kernels@f909de6
The panzer test has some noisy output, maybe the change in kokkos/kokkos-kernels#2157 was sufficient to silence the issues in the MueLu tests but introduced some unexpected extra output in the panzer tests? Launching a build to check |
After some running manual builds and sanity testing, it looks like a Trilinos change between submission of #12863 on (Mar 26) and submission+testing of this PR (Apr 2) is triggering the failure of the Panzer tests
Next I'm going to bisect on merged PRs between the two referenced above, skipping the AT/CI related merges. This is the list of PRs I'm checking, merged between #12858 and #12870: #12868 - skip, CI |
I bisected the failures to changes merged with #12852 I retested with changing the guarded I attached a diff of what I tested for clarity: |
@ndellingwood I'm trying to replicate and then debug this now. Changing the KK version threshold to 40399 is basically equivalent to reverting #12852 right? Do the panzer tests pass with KK develop which has version=40399 now? If not I would say just change |
Thanks for the response @brian-kelley , I'll share a reproducer configuration with you (slight modification to the genconfig fragments posted in cdash)
yeah, basically :(
Build is almost complete, I'll let you know shortly |
@ndellingwood I would be surprised if KK develop fixes this - I just looked at a diff between develop and the release candidate, and the changes should have no effect unless a) multiple streams are used with a single spmv handle, or b) rocsparse and BSR matrices are used together. But obviously those aren't in play here because the build is serial backend. |
@brian-kelley you're right, my build just finished with the develop branch and the tests still fail there like expected |
…d89c0d From repository at [email protected]:kokkos/kokkos.git At commit: commit 47a50ac3ca9c93746fa6c23629d9fe55ecd89c0d Author: Nathan Ellingwood <[email protected]> Date: Mon Apr 1 18:21:56 2024 -0600 Update master_history.txt for 4.3.0
…e042aa36ef3020 From repository at [email protected]:kokkos/kokkos-kernels.git At commit: commit d8e2b21ce71363e689eaa81fc8e042aa36ef3020 Author: Nathan Ellingwood <[email protected]> Date: Mon Apr 1 18:22:20 2024 -0600 Update master_history.txt for 4.3.0
f0b952d
to
a0dec6e
Compare
- resolves failures in panzer tests detected during testing of #12879 Co-authored-by: brian-kelley <[email protected]>
Kokkos and KokkosKernels snapshots have been updated, @brian-kelley provided a patch to resolve the panzer example failures (sha d21345a) , ready for retest |
Status Flag 'Pre-Test Inspection' - Auto Inspected - Inspection is Not Necessary for this Pull Request. |
Status Flag 'Pull Request AutoTester' - Testing Jenkins Projects: Pull Request Auto Testing STARTING (click to expand)Build InformationTest Name: Trilinos_PR_gcc-8.3.0
Jenkins Parameters
Build InformationTest Name: Trilinos_PR_gcc-8.3.0-serial
Jenkins Parameters
Build InformationTest Name: Trilinos_PR_gcc-8.3.0-debug
Jenkins Parameters
Build InformationTest Name: Trilinos_PR_clang-11.0.1
Jenkins Parameters
Build InformationTest Name: Trilinos_PR_python3
Jenkins Parameters
Build InformationTest Name: Trilinos_PR_cuda-11.4.2-uvm-off
Jenkins Parameters
Build InformationTest Name: Trilinos_PR_intel-2021.3
Jenkins Parameters
Build InformationTest Name: Trilinos_PR_cuda-11.4.20-uvm
Jenkins Parameters
Using Repos:
Pull Request Author: ndellingwood |
Status Flag 'Pull Request AutoTester' - Jenkins Testing: 1 or more Jobs FAILED Note: Testing will normally be attempted again in approx. 2 Hrs 30 Mins. If a change to the PR source branch occurs, the testing will be attempted again on next available autotester run. Pull Request Auto Testing has FAILED (click to expand)Build InformationTest Name: Trilinos_PR_gcc-8.3.0
Jenkins Parameters
Build InformationTest Name: Trilinos_PR_gcc-8.3.0-serial
Jenkins Parameters
Build InformationTest Name: Trilinos_PR_gcc-8.3.0-debug
Jenkins Parameters
Build InformationTest Name: Trilinos_PR_clang-11.0.1
Jenkins Parameters
Build InformationTest Name: Trilinos_PR_python3
Jenkins Parameters
Build InformationTest Name: Trilinos_PR_cuda-11.4.2-uvm-off
Jenkins Parameters
Build InformationTest Name: Trilinos_PR_intel-2021.3
Jenkins Parameters
Build InformationTest Name: Trilinos_PR_cuda-11.4.20-uvm
Jenkins Parameters
|
@brian-kelley - Thanks for getting this fixed! Could you comment on what you had to do to get the panzer tests passing? Just wondering if there is something we should look at from the panzer side for future kokkos releases. |
@rppawlo The output vector y to a It's not a Panzer issue though. We do support this beta=0 case in KokkosKernels/Tpetra, we just have to test it better in KokkosKernels: kokkos/kokkos-kernels#2166 On the other hand, I'm not sure where these NaNs were coming from originally, but getting rid of them would have also fixed these tests |
Thanks @brian-kelley ! The nans are most likely from panzer - added specifically to catch just this type of error. See the comments on the far right in the code block below:
|
CDash summary: Two jobs failed at compilation (Trilinos_PR_gcc-8.3.0, Trilinos_PR_gcc-8.3.0-debug), looks like changes in sha d21345a left a potentially unused
I'm guessing the variable just needs to be moved to a guarded region, taking a look now |
Just pushed b1a4376 which should resolve the |
Status Flag 'Pre-Test Inspection' - Auto Inspected - Inspection is Not Necessary for this Pull Request. |
Status Flag 'Pull Request AutoTester' - Testing Jenkins Projects: Pull Request Auto Testing STARTING (click to expand)Build InformationTest Name: Trilinos_PR_gcc-8.3.0
Jenkins Parameters
Build InformationTest Name: Trilinos_PR_gcc-8.3.0-serial
Jenkins Parameters
Build InformationTest Name: Trilinos_PR_gcc-8.3.0-debug
Jenkins Parameters
Build InformationTest Name: Trilinos_PR_clang-11.0.1
Jenkins Parameters
Build InformationTest Name: Trilinos_PR_python3
Jenkins Parameters
Build InformationTest Name: Trilinos_PR_cuda-11.4.2-uvm-off
Jenkins Parameters
Build InformationTest Name: Trilinos_PR_intel-2021.3
Jenkins Parameters
Build InformationTest Name: Trilinos_PR_cuda-11.4.20-uvm
Jenkins Parameters
Using Repos:
Pull Request Author: ndellingwood |
Status Flag 'Pull Request AutoTester' - Jenkins Testing: all Jobs PASSED Pull Request Auto Testing has PASSED (click to expand)Build InformationTest Name: Trilinos_PR_gcc-8.3.0
Jenkins Parameters
Build InformationTest Name: Trilinos_PR_gcc-8.3.0-serial
Jenkins Parameters
Build InformationTest Name: Trilinos_PR_gcc-8.3.0-debug
Jenkins Parameters
Build InformationTest Name: Trilinos_PR_clang-11.0.1
Jenkins Parameters
Build InformationTest Name: Trilinos_PR_python3
Jenkins Parameters
Build InformationTest Name: Trilinos_PR_cuda-11.4.2-uvm-off
Jenkins Parameters
Build InformationTest Name: Trilinos_PR_intel-2021.3
Jenkins Parameters
Build InformationTest Name: Trilinos_PR_cuda-11.4.20-uvm
Jenkins Parameters
|
Status Flag 'Pre-Merge Inspection' - - This Pull Request Requires Inspection... The code must be inspected by a member of the Team before Testing/Merging |
All Jobs Finished; status = PASSED, However Inspection must be performed before merge can occur... |
Status Flag 'Pre-Merge Inspection' - SUCCESS: The last commit to this Pull Request has been INSPECTED AND APPROVED by [ lucbv ]! |
Status Flag 'Pull Request AutoTester' - AutoMerge IS ENABLED, but the Label AT: AUTOMERGE is not set. Either set Label AT: AUTOMERGE or manually merge the PR... |
@trilinos/kokkos @trilinos/kokkos-kernels
Motivation
Snapshot Kokkos Ecosystem 4.3.0 to Trilinos
Testing
Various nightly testing jobs of release candidate, AT pre-test with #12863; AT