Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update toolchains on tioga, lassen, ruby and poodle #275

Merged
merged 18 commits into from
Sep 24, 2024

Conversation

adrienbernede
Copy link
Member

@adrienbernede adrienbernede commented Aug 7, 2024

  • Remove Intel 19
  • Update to Intel 2023
  • Update corona to ROCm 6.0.2 -> same error as with tioga, need ROCm 6.1.x , reverted to ROCm 5.7.1
  • Update tioga to ROCm 6.1.2
  • Remove CUDA 10 job (blueos system default is now 11.2.0)

Errors

  • gcc 11.2.1 + cuda 11.8.0
[ 69%] Linking CXX executable ../../bin/managed_array_unit_tests
tmpxft_0001363d_00000000-6_managed_array_tests.cudafe1.stub.c:35:1: error: redefinition of 'const char _ZTSZN4chai12ManagedArrayIdE8allocateEmNS_14ExecutionSpaceERKSt8functionIFvPKNS_13PointerRecordENS_6ActionES2_EEEd_UlS6_S7_S2_E_ []'
tmpxft_0001363d_00000000-6_managed_array_tests.cudafe1.stub.c:35:1: note: 'const char _ZTSZN4chai12ManagedArrayIdE8allocateEmNS_14ExecutionSpaceERKSt8functionIFvPKNS_13PointerRecordENS_6ActionES2_EEEd_UlS6_S7_S2_E_ [125]' previously defined here
tmpxft_0001363d_00000000-6_managed_array_tests.cudafe1.stub.c:35:1: error: redefinition of 'const char _ZTSZN4chai12ManagedArrayIfE8allocateEmNS_14ExecutionSpaceERKSt8functionIFvPKNS_13PointerRecordENS_6ActionES2_EEEd_UlS6_S7_S2_E_ []'
tmpxft_0001363d_00000000-6_managed_array_tests.cudafe1.stub.c:35:1: note: 'const char _ZTSZN4chai12ManagedArrayIfE8allocateEmNS_14ExecutionSpaceERKSt8functionIFvPKNS_13PointerRecordENS_6ActionES2_EEEd_UlS6_S7_S2_E_ [125]' previously defined here
gmake[2]: *** [tests/integration/CMakeFiles/managed_array_tests.dir/build.make:76: tests/integration/CMakeFiles/managed_array_tests.dir/managed_array_tests.cpp.o] Error 1
gmake[1]: *** [CMakeFiles/Makefile2:1724: tests/integration/CMakeFiles/managed_array_tests.dir/all] Error 2
gmake[1]: *** Waiting for unfinished jobs....

Copy link
Member

@adayton1 adayton1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@adayton1 adayton1 requested a review from liu15 August 8, 2024 21:32
@adrienbernede adrienbernede changed the title Update toolchains on tioga, lassen, ruby and poodle [WIP] Update toolchains on tioga, lassen, ruby and poodle Aug 9, 2024
@adrienbernede
Copy link
Member Author

@adayton1 regarding the failure in the gcc 11.2.1 + cuda 11.8 build, do you think I should ignore the failure or wait for a fix?

@adayton1
Copy link
Member

adayton1 commented Sep 5, 2024

@adayton1 regarding the failure in the gcc 11.2.1 + cuda 11.8 build, do you think I should ignore the failure or wait for a fix?

I'm looking into this again.

@adayton1
Copy link
Member

adayton1 commented Sep 5, 2024

I think nvcc preprocesses std::function incorrectly. This isn't the first time I've seen issues with nvcc and the std library header.

@adayton1
Copy link
Member

I think nvcc preprocesses std::function incorrectly. This isn't the first time I've seen issues with nvcc and the std library header.

I think we might just have to allow failure for now on this job.

@adrienbernede
Copy link
Member Author

adrienbernede commented Sep 16, 2024

@adayton1 I was trying to understand the issue better, and here is my conclusion:

In src/chai/ManagedArray.inl at line 444,

template<typename T>
template<typename Idx>
CHAI_INLINE
CHAI_HOST_DEVICE T& ManagedArray<T>::operator[](const Idx i) const {
  return m_active_pointer[i];
}

This operator appears to be defined for both __host__ and __device__, while there exists an implementation of the class ManagedArray that is only defined for __host__,
In src/chai/ManagedArray.inl at line 82,

template<typename T>
CHAI_INLINE
CHAI_HOST ManagedArray<T>::ManagedArray(PointerRecord* record, ExecutionSpace space):
[...]

I think this explains the warning.

Then, the error itself appears to be related in the following way:

tmpxft_00021924_00000000-6_managed_array_tests.cudafe1.stub.c:35:1: error: redefinition of 'const char _ZTSZN4chai12ManagedArrayIdE8allocateEmNS_14ExecutionSpaceERKSt8functionIFvPKNS_13PointerRecordENS_6ActionES2_EEEd_UlS6_S7_S2_E_ []'
tmpxft_00021924_00000000-6_managed_array_tests.cudafe1.stub.c:35:1: note: 'const char _ZTSZN4chai12ManagedArrayIdE8allocateEmNS_14ExecutionSpaceERKSt8functionIFvPKNS_13PointerRecordENS_6ActionES2_EEEd_UlS6_S7_S2_E_ [125]' previously defined here

It looks like there is a mismatch between class implementations.
In tests/unit/managed_array_unit_tests.cpp at line ,

TEST(ManagedArray, SpaceConstructorCPU)
{
  chai::ManagedArray<float> array(10, chai::CPU);
  ASSERT_EQ(array.size(), 10u);
  array.free();
}

I think the intent is for the test to instantiate

CHAI_HOST_DEVICE ManagedArray<T>::ManagedArray(
    size_t elems,
    ExecutionSpace space) :
  ManagedArray()

defined at line 53 of src/chai/ManagedArray.inl, but the error clearly refers to the implementation involved in the above warning, the __host__ only one (line 84):

template<typename T>
CHAI_INLINE
CHAI_HOST ManagedArray<T>::ManagedArray(PointerRecord* record, ExecutionSpace space):
[...]

Could it be that PointerRecord* and size_t are not distinguishable ? I’m already beyond my limits... ;)

@adrienbernede adrienbernede changed the title [WIP] Update toolchains on tioga, lassen, ruby and poodle Update toolchains on tioga, lassen, ruby and poodle Sep 24, 2024
@adrienbernede
Copy link
Member Author

@adayton1 You may now merge the PR.

@adayton1 adayton1 merged commit ec3fe01 into develop Sep 24, 2024
9 checks passed
@adayton1 adayton1 deleted the woptim/rsc-update branch September 24, 2024 15:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants