Random mass test failures in PR builds due to failure to load libcuda.so starting at least by 2024-11-20 #13730
Labels
impacting: tests
The defect (bug) is primarily a test failure (vs. a build failure)
PA: Framework
Issues that fall under the Trilinos Framework Product Area
type: bug
The primary issue is a bug in Trilinos code or tests
CC: @trilinos/framework, @sebrowne, @achauphan
Next Action Status
Description
As shown in this query (click "Shown Matching Output" in upper right) 11144 tests failed, including 2839 unique tests in the unique GenConfig builds:
rhel8_sems-cuda-11.4.2-gnu-10.1.0-openmpi-4.1.6_release_static_Volta70_no-asan_complex_no-fpic_mpi_pt_no-rdc_uvm_deprecated-on_no-package-enables
started failing on testing day 2024-11-20.
The specific set of CDash builds impacted where:
PR-13622-test-rhel8_sems-cuda-11.4.2-gnu-10.1.0-openmpi-4.1.6_release_static_Volta70_no-asan_complex_no-fpic_mpi_pt_no-rdc_uvm_deprecated-on_no-package-enables-859
PR-13622-test-rhel8_sems-cuda-11.4.2-gnu-10.1.0-openmpi-4.1.6_release_static_Volta70_no-asan_complex_no-fpic_mpi_pt_no-rdc_uvm_deprecated-on_no-package-enables-865
PR-13715-test-rhel8_sems-cuda-11.4.2-gnu-10.1.0-openmpi-4.1.6_release_static_Volta70_no-asan_complex_no-fpic_mpi_pt_no-rdc_uvm_deprecated-on_no-package-enables-1039
PR-13715-test-rhel8_sems-cuda-11.4.2-gnu-10.1.0-openmpi-4.1.6_release_static_Volta70_no-asan_complex_no-fpic_mpi_pt_no-rdc_uvm_deprecated-on_no-package-enables-1041
The most recent failures for that last PR #13715 were from 2025-01-10.
The failures looked like:
Current Status on CDash
Run the above query adjusting the "Begin" and "End" dates to match today any other date range or just click "CURRENT" in the top bar to see results for the current testing day.
Steps to Reproduce
See:
If you can't figure out what commands to run to reproduce the problem given this documentation, then please post a comment here and we will give you the exact minimal commands.
The text was updated successfully, but these errors were encountered: