Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tabale reproducer #281

Merged
merged 17 commits into from
Sep 18, 2024
Merged

Tabale reproducer #281

merged 17 commits into from
Sep 18, 2024

Conversation

liu15
Copy link
Collaborator

@liu15 liu15 commented Sep 11, 2024

To reproduce crash on rzadams:

mkdir build_rocm
cd build_rocm
cmake -DCHAI_ENABLE_REPRODUCERS=1 -C ../configs/lc/toss_4_x86_64_ib_cray/amdclang.cmake ..
flux alloc -N 1 -n 1 -g1 make -j 40
flux alloc -N 1 -n 1 -g1 ./bin/managed_ptr_multiple_inheritance_reproducer.exe

  • Note that the "this" pointer in YofXfromRTTable1D::RootFromBaseX differs from the "this" pointer in other YofXfromRTTable1D methods.
  • Interestingly , the crash goes away if GetNumStrings() is removed

The rzansel case does not crash and has consistent pointer addresses in YofXfromRTTable1D. This can be reproduced with

mkdir build_cuda
cd build_cuda
cmake -DCHAI_ENABLE_REPRODUCERS=1 -C ../configs/lc/blueos_3_ppc64le_ib_p9/nvcc_clang.cmake ..
lalloc 1 make -j 40
lalloc 1 ./bin/managed_ptr_multiple_inheritance_reproducer.exe

Copy link
Member

@adayton1 adayton1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@adayton1 adayton1 requested a review from dtaller September 17, 2024 14:41
@@ -71,8 +71,6 @@ cmake_dependent_option(CHAI_ENABLE_DOCS "Build CHAI docs" On
"ENABLE_DOCS" Off)
cmake_dependent_option( CHAI_ENABLE_GMOCK "Build CHAI with gmock" On
"ENABLE_GMOCK" Off )
cmake_dependent_option(CHAI_ENABLE_REPRODUCERS "Build CHAI reproducers" On
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ENABLE_REPRODUCERS is not a variable defined anywhere (unlike ENABLE_TESTS for example, which is a BLT variable). Even if it was, I don't think it makes sense for it to be a dependent variable. It is already an option in cmake/SetupChaiOptions.cmake.

@dtaller
Copy link
Contributor

dtaller commented Sep 17, 2024

To reproduce crash on rzadams:

mkdir build_rocm cd build_rocm cmake -DCHAI_ENABLE_REPRODUCERS=1 -C ../configs/lc/toss_4_x86_64_ib_cray/amdclang.cmake .. flux alloc -N 1 -n 1 -g1 make -j 40 flux alloc -N 1 -n 1 -g1 ./bin/managed_ptr_multiple_inheritance_reproducer.exe

* Note that the "this" pointer in YofXfromRTTable1D::RootFromBaseX differs from the "this" pointer in other YofXfromRTTable1D methods.

* Interestingly , the crash goes away if GetNumStrings() is removed

The rzansel case does not crash and has consistent pointer addresses in YofXfromRTTable1D. This can be reproduced with

mkdir build_cuda cd build_cuda cmake -DCHAI_ENABLE_REPRODUCERS=1 -C ../configs/lc/blueos_3_ppc64le_ib_p9/nvcc_clang.cmake .. lalloc 1 make -j 40 lalloc 1 ./bin/managed_ptr_multiple_inheritance_reproducer.exe

@liu15 . I would put somthing like this (explanation of the reproducer and what systems it fails on and how) as a comment at the top of the reproducer or in a README file somewhere. Those details would be useful to figuring out what it wrong, so it would be nice to have it documented there

Copy link
Contributor

@dtaller dtaller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Besides adding a readme or big comment explaining the reproducer (as I suggested in my last comment), this looks good.

@adayton1 adayton1 merged commit 6e1388b into develop Sep 18, 2024
7 of 9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants