Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-45371: [C++] Fix data race in SimpleRecordBatch::columns #45372

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

colin-r-schultz
Copy link

@colin-r-schultz colin-r-schultz commented Jan 28, 2025

Rationale for this change

GH-45371

What changes are included in this PR?

Use std::atomic_compare_exchange to initialize boxed_columns_[i] so they are correctly written only once. This means that a reference to boxed_columns_ is safe to read after each element has been initialized.

Are these changes tested?

Yes, there is a test case TestRecordBatch.ColumnsThreadSafety which passes under TSAN.

Are there any user-facing changes?

No

This PR contains a "Critical Fix".

Without this fix, concurrent calls to SimpleRecordBatch::columns could lead to an invalid memory access and crash.

Copy link

⚠️ GitHub issue #45371 has been automatically assigned in GitHub to PR creator.

Copy link
Member

@mapleFU mapleFU left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, also cc @bkietz @pitrou

auto schema = ::arrow::schema({field("f1", utf8())});
auto record_batch = RecordBatch::Make(schema, length, {array_data});
std::atomic_bool start_flag{false};
std::thread t([record_batch, &start_flag]() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should more than one thread be tested here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this should use several threads that would do the same thing concurrently.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the current test the race is between t and the main thread. Only 2 threads are necessary to produce the data race.

auto schema = ::arrow::schema({field("f1", utf8())});
auto record_batch = RecordBatch::Make(schema, length, {array_data});
std::atomic_bool start_flag{false};
std::thread t([record_batch, &start_flag]() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this should use several threads that would do the same thing concurrently.

std::thread t([record_batch, &start_flag]() {
start_flag.store(true);
auto columns = record_batch->columns();
ASSERT_EQ(columns.size(), 1);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this test? boxed_columns_ is presized in the constructor, so this should always succeed.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this assertion should always pass. The purpose is to ensure that columns is not optimized out. This test will either produce a TSAN warning or not, but there is no assertion that will fail consistently under the data race.

cpp/src/arrow/record_batch_test.cc Outdated Show resolved Hide resolved

random::RandomArrayGenerator gen(42);
std::shared_ptr<ArrayData> array_data = gen.ArrayOf(utf8(), length)->data();
auto schema = ::arrow::schema({field("f1", utf8())});
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There could also be several fields and the worker thread would call column(i) several times with i being a random number. Something like (untested):

  constexpr int kNumFields = 40;
  constexpr int kNumThreads = 50;
  auto schema = ::arrow::schema(FieldVector(kNumFields, field("f1", utf8())));
  auto batch = RecordBatch::Make(schema, length, ArrayDataVector(kNumFields, array_data));

  std::random_device rd;
  std::vector<std::threads> threads(kNumThreads);
  for (auto& thread : threads) {
    const auto seed = rd();
    thread = std::thread([&]() {
      std::default_engine rng(seed);
      std::uniform_int_distribution<int> field_dist(0, kNumFields - 1);
      for (int i = 0; i < kNumFields; ++i) {
        ASSERT_NE(nullptr, batch->column(field_dist(rng)));
      }
    });
  }
  for (auto& thread : threads) {
    thread.join();
  }

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The data race only appears when columns() is called because it allows non-atomic reads to boxed_columns_. The column(i) function alone is thread safe because it only uses atomics. It will also never return nullptr.

@colin-r-schultz colin-r-schultz force-pushed the recordbatch-columns-thread-safety branch from 87216a6 to 41e23d7 Compare January 29, 2025 19:58
@colin-r-schultz
Copy link
Author

Thanks for taking the time to review my PR! Let me just clarify the intent behind my test case. It's a minimal example that produces a data race detected by TSAN.

The data race only needs to occur between 2 threads, in this case I use the main thread and thread t. There also need only be 1 column. What happens when the two threads call auto columns = record_batch->columns() is as follows:

  1. T1 calls atomic_load(&boxed_columns_[0]) and reads nullptr
  2. T2 calls atomic_load(&boxed_columns_[0]) and reads nullptr
  3. T1 calls MakeArray and then atomic_store(&boxed_columns_[0], result)

Now, we simultaneously have T1 reading boxed_columns_[0] in order to copy-construct the columns variable in the test while T2 calls atomic_store(&boxed_columns_[0], result)

There isn't an assertion that will catch this case because it is impossible for boxed_columns_[0] to be read as nullptr because it has certainly been initialized by T1. So instead we can really on TSAN to prove that the data race exists. The output of running this test on the main branch using the ninja-debug-tsan preset is below:

Test output
[ RUN      ] TestRecordBatch.ColumnsThreadSafety
==================
WARNING: ThreadSanitizer: data race (pid=21327)
  Write of size 8 at 0x7b0400000db0 by thread T1 (mutexes: write M811):
    #0 std::enable_if<std::__and_<std::__not_<std::__is_tuple_like<arrow::Array*> >, std::is_move_constructible<arrow::Array*>, std::is_move_assignable<arrow::Array*> >::value, void>::type std::swap<arrow::Array*>(arrow::Array*&, arrow::Array*&) /usr/include/c++/12/bits/move.h:205 (arrow-table-test+0x1ec713)
    #1 std::__shared_ptr<arrow::Array, (__gnu_cxx::_Lock_policy)2>::swap(std::__shared_ptr<arrow::Array, (__gnu_cxx::_Lock_policy)2>&) /usr/include/c++/12/bits/shared_ptr_base.h:1686 (arrow-table-test+0x1db302)
    #2 void std::atomic_store_explicit<arrow::Array>(std::shared_ptr<arrow::Array>*, std::shared_ptr<arrow::Array>, std::memory_order) /usr/include/c++/12/bits/shared_ptr_atomic.h:169 (libarrow.so.2000+0x19dddd0)
    #3 void std::atomic_store<arrow::Array>(std::shared_ptr<arrow::Array>*, std::shared_ptr<arrow::Array>) /usr/include/c++/12/bits/shared_ptr_atomic.h:175 (libarrow.so.2000+0x19d7f42)
    #4 arrow::SimpleRecordBatch::column(int) const /home/user/arrow/cpp/src/arrow/record_batch.cc:106 (libarrow.so.2000+0x19d3cbd)
    #5 arrow::SimpleRecordBatch::columns() const /home/user/arrow/cpp/src/arrow/record_batch.cc:97 (libarrow.so.2000+0x19d3b5a)
    #6 operator() /home/user/arrow/cpp/src/arrow/record_batch_test.cc:407 (arrow-table-test+0x18c951)
    #7 __invoke_impl<void, arrow::TestRecordBatch_ColumnsThreadSafety_Test::TestBody()::<lambda()> > /usr/include/c++/12/bits/invoke.h:61 (arrow-table-test+0x1bb876)
    #8 __invoke<arrow::TestRecordBatch_ColumnsThreadSafety_Test::TestBody()::<lambda()> > /usr/include/c++/12/bits/invoke.h:96 (arrow-table-test+0x1bb7ed)
    #9 _M_invoke<0> /usr/include/c++/12/bits/std_thread.h:279 (arrow-table-test+0x1bb74e)
    #10 operator() /usr/include/c++/12/bits/std_thread.h:286 (arrow-table-test+0x1bb6f4)
    #11 _M_run /usr/include/c++/12/bits/std_thread.h:231 (arrow-table-test+0x1bb6aa)
    #12 <null> <null> (libstdc++.so.6+0xdc252)

  Previous read of size 8 at 0x7b0400000db0 by main thread:
    #0 std::__shared_ptr<arrow::Array, (__gnu_cxx::_Lock_policy)2>::__shared_ptr(std::__shared_ptr<arrow::Array, (__gnu_cxx::_Lock_policy)2> const&) /usr/include/c++/12/bits/shared_ptr_base.h:1522 (arrow-table-test+0xf32b6)
    #1 std::shared_ptr<arrow::Array>::shared_ptr(std::shared_ptr<arrow::Array> const&) /usr/include/c++/12/bits/shared_ptr.h:204 (arrow-table-test+0xf332a)
    #2 void std::_Construct<std::shared_ptr<arrow::Array>, std::shared_ptr<arrow::Array> const&>(std::shared_ptr<arrow::Array>*, std::shared_ptr<arrow::Array> const&) /usr/include/c++/12/bits/stl_construct.h:119 (arrow-table-test+0x116c63)
    #3 std::shared_ptr<arrow::Array>* std::__do_uninit_copy<__gnu_cxx::__normal_iterator<std::shared_ptr<arrow::Array> const*, std::vector<std::shared_ptr<arrow::Array>, std::allocator<std::shared_ptr<arrow::Array> > > >, std::shared_ptr<arrow::Array>*>(__gnu_cxx::__normal_iterator<std::shared_ptr<arrow::Array> const*, std::vector<std::shared_ptr<arrow::Array>, std::allocator<std::shared_ptr<arrow::Array> > > >, __gnu_cxx::__normal_iterator<std::shared_ptr<arrow::Array> const*, std::vector<std::shared_ptr<arrow::Array>, std::allocator<std::shared_ptr<arrow::Array> > > >, std::shared_ptr<arrow::Array>*) /usr/include/c++/12/bits/stl_uninitialized.h:120 (arrow-table-test+0x1119cb)
    #4 std::shared_ptr<arrow::Array>* std::__uninitialized_copy<false>::__uninit_copy<__gnu_cxx::__normal_iterator<std::shared_ptr<arrow::Array> const*, std::vector<std::shared_ptr<arrow::Array>, std::allocator<std::shared_ptr<arrow::Array> > > >, std::shared_ptr<arrow::Array>*>(__gnu_cxx::__normal_iterator<std::shared_ptr<arrow::Array> const*, std::vector<std::shared_ptr<arrow::Array>, std::allocator<std::shared_ptr<arrow::Array> > > >, __gnu_cxx::__normal_iterator<std::shared_ptr<arrow::Array> const*, std::vector<std::shared_ptr<arrow::Array>, std::allocator<std::shared_ptr<arrow::Array> > > >, std::shared_ptr<arrow::Array>*) /usr/include/c++/12/bits/stl_uninitialized.h:137 (arrow-table-test+0x10896d)
    #5 std::shared_ptr<arrow::Array>* std::uninitialized_copy<__gnu_cxx::__normal_iterator<std::shared_ptr<arrow::Array> const*, std::vector<std::shared_ptr<arrow::Array>, std::allocator<std::shared_ptr<arrow::Array> > > >, std::shared_ptr<arrow::Array>*>(__gnu_cxx::__normal_iterator<std::shared_ptr<arrow::Array> const*, std::vector<std::shared_ptr<arrow::Array>, std::allocator<std::shared_ptr<arrow::Array> > > >, __gnu_cxx::__normal_iterator<std::shared_ptr<arrow::Array> const*, std::vector<std::shared_ptr<arrow::Array>, std::allocator<std::shared_ptr<arrow::Array> > > >, std::shared_ptr<arrow::Array>*) /usr/include/c++/12/bits/stl_uninitialized.h:185 (arrow-table-test+0x103b49)
    #6 std::shared_ptr<arrow::Array>* std::__uninitialized_copy_a<__gnu_cxx::__normal_iterator<std::shared_ptr<arrow::Array> const*, std::vector<std::shared_ptr<arrow::Array>, std::allocator<std::shared_ptr<arrow::Array> > > >, std::shared_ptr<arrow::Array>*, std::shared_ptr<arrow::Array> >(__gnu_cxx::__normal_iterator<std::shared_ptr<arrow::Array> const*, std::vector<std::shared_ptr<arrow::Array>, std::allocator<std::shared_ptr<arrow::Array> > > >, __gnu_cxx::__normal_iterator<std::shared_ptr<arrow::Array> const*, std::vector<std::shared_ptr<arrow::Array>, std::allocator<std::shared_ptr<arrow::Array> > > >, std::shared_ptr<arrow::Array>*, std::allocator<std::shared_ptr<arrow::Array> >&) /usr/include/c++/12/bits/stl_uninitialized.h:372 (arrow-table-test+0xfdd56)
    #7 std::vector<std::shared_ptr<arrow::Array>, std::allocator<std::shared_ptr<arrow::Array> > >::vector(std::vector<std::shared_ptr<arrow::Array>, std::allocator<std::shared_ptr<arrow::Array> > > const&) /usr/include/c++/12/bits/stl_vector.h:601 (arrow-table-test+0xf726c)
    #8 arrow::TestRecordBatch_ColumnsThreadSafety_Test::TestBody() /home/user/arrow/cpp/src/arrow/record_batch_test.cc:413 (arrow-table-test+0x18cf6a)
    #9 void testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) /home/user/arrow/cpp/out/build/ninja-debug-tsan/_deps/googletest-src/googletest/src/gtest.cc:2607 (libarrow_gtestd.so.1.11.0+0xd6672)

  Location is heap block of size 16 at 0x7b0400000db0 allocated by main thread:
    #0 operator new(unsigned long) ../../../../src/libsanitizer/tsan/tsan_new_delete.cpp:64 (libtsan.so.2+0x8d7d9)
    #1 std::__new_allocator<std::shared_ptr<arrow::Array> >::allocate(unsigned long, void const*) /usr/include/c++/12/bits/new_allocator.h:137 (arrow-table-test+0x110a98)
    #2 std::allocator_traits<std::allocator<std::shared_ptr<arrow::Array> > >::allocate(std::allocator<std::shared_ptr<arrow::Array> >&, unsigned long) /usr/include/c++/12/bits/alloc_traits.h:464 (arrow-table-test+0x1075ec)
    #3 std::_Vector_base<std::shared_ptr<arrow::Array>, std::allocator<std::shared_ptr<arrow::Array> > >::_M_allocate(unsigned long) /usr/include/c++/12/bits/stl_vector.h:378 (arrow-table-test+0x10130c)
    #4 std::vector<std::shared_ptr<arrow::Array>, std::allocator<std::shared_ptr<arrow::Array> > >::_M_default_append(unsigned long) /usr/include/c++/12/bits/vector.tcc:657 (libarrow.so.2000+0x19ddaeb)
    #5 std::vector<std::shared_ptr<arrow::Array>, std::allocator<std::shared_ptr<arrow::Array> > >::resize(unsigned long) /usr/include/c++/12/bits/stl_vector.h:1011 (libarrow.so.2000+0x19d7e0d)
    #6 arrow::SimpleRecordBatch::SimpleRecordBatch(std::shared_ptr<arrow::Schema> const&, long, std::vector<std::shared_ptr<arrow::ArrayData>, std::allocator<std::shared_ptr<arrow::ArrayData> > >, arrow::DeviceAllocationType, std::shared_ptr<arrow::Device::SyncEvent>) /home/user/arrow/cpp/src/arrow/record_batch.cc:91 (libarrow.so.2000+0x19d3a8e)
    #7 void std::_Construct<arrow::SimpleRecordBatch, std::shared_ptr<arrow::Schema>, long&, std::vector<std::shared_ptr<arrow::ArrayData>, std::allocator<std::shared_ptr<arrow::ArrayData> > >, arrow::DeviceAllocationType&, std::shared_ptr<arrow::Device::SyncEvent> >(arrow::SimpleRecordBatch*, std::shared_ptr<arrow::Schema>&&, long&, std::vector<std::shared_ptr<arrow::ArrayData>, std::allocator<std::shared_ptr<arrow::ArrayData> > >&&, arrow::DeviceAllocationType&, std::shared_ptr<arrow::Device::SyncEvent>&&) /usr/include/c++/12/bits/stl_construct.h:119 (libarrow.so.2000+0x19f2c75)
    #8 void std::allocator_traits<std::allocator<void> >::construct<arrow::SimpleRecordBatch, std::shared_ptr<arrow::Schema>, long&, std::vector<std::shared_ptr<arrow::ArrayData>, std::allocator<std::shared_ptr<arrow::ArrayData> > >, arrow::DeviceAllocationType&, std::shared_ptr<arrow::Device::SyncEvent> >(std::allocator<void>&, arrow::SimpleRecordBatch*, std::shared_ptr<arrow::Schema>&&, long&, std::vector<std::shared_ptr<arrow::ArrayData>, std::allocator<std::shared_ptr<arrow::ArrayData> > >&&, arrow::DeviceAllocationType&, std::shared_ptr<arrow::Device::SyncEvent>&&) /usr/include/c++/12/bits/alloc_traits.h:635 (libarrow.so.2000+0x19f0969)
    #9 std::_Sp_counted_ptr_inplace<arrow::SimpleRecordBatch, std::allocator<void>, (__gnu_cxx::_Lock_policy)2>::_Sp_counted_ptr_inplace<std::shared_ptr<arrow::Schema>, long&, std::vector<std::shared_ptr<arrow::ArrayData>, std::allocator<std::shared_ptr<arrow::ArrayData> > >, arrow::DeviceAllocationType&, std::shared_ptr<arrow::Device::SyncEvent> >(std::allocator<void>, std::shared_ptr<arrow::Schema>&&, long&, std::vector<std::shared_ptr<arrow::ArrayData>, std::allocator<std::shared_ptr<arrow::ArrayData> > >&&, arrow::DeviceAllocationType&, std::shared_ptr<arrow::Device::SyncEvent>&&) /usr/include/c++/12/bits/shared_ptr_base.h:604 (libarrow.so.2000+0x19ed7ca)
    #10 std::__shared_count<(__gnu_cxx::_Lock_policy)2>::__shared_count<arrow::SimpleRecordBatch, std::allocator<void>, std::shared_ptr<arrow::Schema>, long&, std::vector<std::shared_ptr<arrow::ArrayData>, std::allocator<std::shared_ptr<arrow::ArrayData> > >, arrow::DeviceAllocationType&, std::shared_ptr<arrow::Device::SyncEvent> >(arrow::SimpleRecordBatch*&, std::_Sp_alloc_shared_tag<std::allocator<void> >, std::shared_ptr<arrow::Schema>&&, long&, std::vector<std::shared_ptr<arrow::ArrayData>, std::allocator<std::shared_ptr<arrow::ArrayData> > >&&, arrow::DeviceAllocationType&, std::shared_ptr<arrow::Device::SyncEvent>&&) <null> (libarrow.so.2000+0x19e9414)
    #11 std::__shared_ptr<arrow::SimpleRecordBatch, (__gnu_cxx::_Lock_policy)2>::__shared_ptr<std::allocator<void>, std::shared_ptr<arrow::Schema>, long&, std::vector<std::shared_ptr<arrow::ArrayData>, std::allocator<std::shared_ptr<arrow::ArrayData> > >, arrow::DeviceAllocationType&, std::shared_ptr<arrow::Device::SyncEvent> >(std::_Sp_alloc_shared_tag<std::allocator<void> >, std::shared_ptr<arrow::Schema>&&, long&, std::vector<std::shared_ptr<arrow::ArrayData>, std::allocator<std::shared_ptr<arrow::ArrayData> > >&&, arrow::DeviceAllocationType&, std::shared_ptr<arrow::Device::SyncEvent>&&) <null> (libarrow.so.2000+0x19e504c)
    #12 std::shared_ptr<arrow::SimpleRecordBatch>::shared_ptr<std::allocator<void>, std::shared_ptr<arrow::Schema>, long&, std::vector<std::shared_ptr<arrow::ArrayData>, std::allocator<std::shared_ptr<arrow::ArrayData> > >, arrow::DeviceAllocationType&, std::shared_ptr<arrow::Device::SyncEvent> >(std::_Sp_alloc_shared_tag<std::allocator<void> >, std::shared_ptr<arrow::Schema>&&, long&, std::vector<std::shared_ptr<arrow::ArrayData>, std::allocator<std::shared_ptr<arrow::ArrayData> > >&&, arrow::DeviceAllocationType&, std::shared_ptr<arrow::Device::SyncEvent>&&) <null> (libarrow.so.2000+0x19dec89)
    #13 std::shared_ptr<std::enable_if<!std::is_array<arrow::SimpleRecordBatch>::value, arrow::SimpleRecordBatch>::type> std::make_shared<arrow::SimpleRecordBatch, std::shared_ptr<arrow::Schema>, long&, std::vector<std::shared_ptr<arrow::ArrayData>, std::allocator<std::shared_ptr<arrow::ArrayData> > >, arrow::DeviceAllocationType&, std::shared_ptr<arrow::Device::SyncEvent> >(std::shared_ptr<arrow::Schema>&&, long&, std::vector<std::shared_ptr<arrow::ArrayData>, std::allocator<std::shared_ptr<arrow::ArrayData> > >&&, arrow::DeviceAllocationType&, std::shared_ptr<arrow::Device::SyncEvent>&&) /usr/include/c++/12/bits/shared_ptr.h:1010 (libarrow.so.2000+0x19d90e8)
    #14 arrow::RecordBatch::Make(std::shared_ptr<arrow::Schema>, long, std::vector<std::shared_ptr<arrow::ArrayData>, std::allocator<std::shared_ptr<arrow::ArrayData> > >, arrow::DeviceAllocationType, std::shared_ptr<arrow::Device::SyncEvent>) /home/user/arrow/cpp/src/arrow/record_batch.cc:230 (libarrow.so.2000+0x19c6223)
    #15 arrow::TestRecordBatch_ColumnsThreadSafety_Test::TestBody() /home/user/arrow/cpp/src/arrow/record_batch_test.cc:403 (arrow-table-test+0x18ce84)
    #16 void testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) /home/user/arrow/cpp/out/build/ninja-debug-tsan/_deps/googletest-src/googletest/src/gtest.cc:2607 (libarrow_gtestd.so.1.11.0+0xd6672)

  Mutex M811 (0x7fffeebff080) created at:
    #0 pthread_mutex_lock ../../../../src/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:4324 (libtsan.so.2+0x59bbf)
    #1 std::_Sp_locker::_Sp_locker(void const*) <null> (libstdc++.so.6+0xdb89c)
    #2 std::shared_ptr<arrow::Array> std::atomic_load<arrow::Array>(std::shared_ptr<arrow::Array> const*) /usr/include/c++/12/bits/shared_ptr_atomic.h:138 (libarrow.so.2000+0x19d7ebe)
    #3 arrow::SimpleRecordBatch::column(int) const /home/user/arrow/cpp/src/arrow/record_batch.cc:103 (libarrow.so.2000+0x19d3c20)
    #4 arrow::RecordBatch::Equals(arrow::RecordBatch const&, bool, arrow::EqualOptions const&) const /home/user/arrow/cpp/src/arrow/record_batch.cc:320 (libarrow.so.2000+0x19c75a7)
    #5 arrow::TestRecordBatch_EqualOptions_Test::TestBody() /home/user/arrow/cpp/src/arrow/record_batch_test.cc:105 (arrow-table-test+0x17d808)
    #6 void testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) /home/user/arrow/cpp/out/build/ninja-debug-tsan/_deps/googletest-src/googletest/src/gtest.cc:2607 (libarrow_gtestd.so.1.11.0+0xd6672)

  Thread T1 (tid=21339, running) created by main thread at:
    #0 pthread_create ../../../../src/libsanitizer/tsan/tsan_interceptors_posix.cpp:1001 (libtsan.so.2+0x63a59)
    #1 std::thread::_M_start_thread(std::unique_ptr<std::thread::_State, std::default_delete<std::thread::_State> >, void (*)()) <null> (libstdc++.so.6+0xdc328)
    #2 arrow::TestRecordBatch_ColumnsThreadSafety_Test::TestBody() /home/user/arrow/cpp/src/arrow/record_batch_test.cc:409 (arrow-table-test+0x18cf11)
    #3 void testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) /home/user/arrow/cpp/out/build/ninja-debug-tsan/_deps/googletest-src/googletest/src/gtest.cc:2607 (libarrow_gtestd.so.1.11.0+0xd6672)

SUMMARY: ThreadSanitizer: data race /usr/include/c++/12/bits/move.h:205 in std::enable_if<std::__and_<std::__not_<std::__is_tuple_like<arrow::Array*> >, std::is_move_constructible<arrow::Array*>, std::is_move_assignable<arrow::Array*> >::value, void>::type std::swap<arrow::Array*>(arrow::Array*&, arrow::Array*&)
==================
==================
WARNING: ThreadSanitizer: data race (pid=21327)
  Write of size 8 at 0x7b0400000db8 by thread T1 (mutexes: write M811):
    #0 std::__shared_count<(__gnu_cxx::_Lock_policy)2>::_M_swap(std::__shared_count<(__gnu_cxx::_Lock_policy)2>&) /usr/include/c++/12/bits/shared_ptr_base.h:1101 (arrow-table-test+0x10114d)
    #1 std::__shared_ptr<arrow::Array, (__gnu_cxx::_Lock_policy)2>::swap(std::__shared_ptr<arrow::Array, (__gnu_cxx::_Lock_policy)2>&) /usr/include/c++/12/bits/shared_ptr_base.h:1687 (arrow-table-test+0x1db31d)
    #2 void std::atomic_store_explicit<arrow::Array>(std::shared_ptr<arrow::Array>*, std::shared_ptr<arrow::Array>, std::memory_order) /usr/include/c++/12/bits/shared_ptr_atomic.h:169 (libarrow.so.2000+0x19dddd0)
    #3 void std::atomic_store<arrow::Array>(std::shared_ptr<arrow::Array>*, std::shared_ptr<arrow::Array>) /usr/include/c++/12/bits/shared_ptr_atomic.h:175 (libarrow.so.2000+0x19d7f42)
    #4 arrow::SimpleRecordBatch::column(int) const /home/user/arrow/cpp/src/arrow/record_batch.cc:106 (libarrow.so.2000+0x19d3cbd)
    #5 arrow::SimpleRecordBatch::columns() const /home/user/arrow/cpp/src/arrow/record_batch.cc:97 (libarrow.so.2000+0x19d3b5a)
    #6 operator() /home/user/arrow/cpp/src/arrow/record_batch_test.cc:407 (arrow-table-test+0x18c951)
    #7 __invoke_impl<void, arrow::TestRecordBatch_ColumnsThreadSafety_Test::TestBody()::<lambda()> > /usr/include/c++/12/bits/invoke.h:61 (arrow-table-test+0x1bb876)
    #8 __invoke<arrow::TestRecordBatch_ColumnsThreadSafety_Test::TestBody()::<lambda()> > /usr/include/c++/12/bits/invoke.h:96 (arrow-table-test+0x1bb7ed)
    #9 _M_invoke<0> /usr/include/c++/12/bits/std_thread.h:279 (arrow-table-test+0x1bb74e)
    #10 operator() /usr/include/c++/12/bits/std_thread.h:286 (arrow-table-test+0x1bb6f4)
    #11 _M_run /usr/include/c++/12/bits/std_thread.h:231 (arrow-table-test+0x1bb6aa)
    #12 <null> <null> (libstdc++.so.6+0xdc252)

  Previous read of size 8 at 0x7b0400000db8 by main thread:
    #0 std::__shared_count<(__gnu_cxx::_Lock_policy)2>::__shared_count(std::__shared_count<(__gnu_cxx::_Lock_policy)2> const&) /usr/include/c++/12/bits/shared_ptr_base.h:1075 (arrow-table-test+0xf4f9a)
    #1 std::__shared_ptr<arrow::Array, (__gnu_cxx::_Lock_policy)2>::__shared_ptr(std::__shared_ptr<arrow::Array, (__gnu_cxx::_Lock_policy)2> const&) /usr/include/c++/12/bits/shared_ptr_base.h:1522 (arrow-table-test+0xf32eb)
    #2 std::shared_ptr<arrow::Array>::shared_ptr(std::shared_ptr<arrow::Array> const&) /usr/include/c++/12/bits/shared_ptr.h:204 (arrow-table-test+0xf332a)
    #3 void std::_Construct<std::shared_ptr<arrow::Array>, std::shared_ptr<arrow::Array> const&>(std::shared_ptr<arrow::Array>*, std::shared_ptr<arrow::Array> const&) /usr/include/c++/12/bits/stl_construct.h:119 (arrow-table-test+0x116c63)
    #4 std::shared_ptr<arrow::Array>* std::__do_uninit_copy<__gnu_cxx::__normal_iterator<std::shared_ptr<arrow::Array> const*, std::vector<std::shared_ptr<arrow::Array>, std::allocator<std::shared_ptr<arrow::Array> > > >, std::shared_ptr<arrow::Array>*>(__gnu_cxx::__normal_iterator<std::shared_ptr<arrow::Array> const*, std::vector<std::shared_ptr<arrow::Array>, std::allocator<std::shared_ptr<arrow::Array> > > >, __gnu_cxx::__normal_iterator<std::shared_ptr<arrow::Array> const*, std::vector<std::shared_ptr<arrow::Array>, std::allocator<std::shared_ptr<arrow::Array> > > >, std::shared_ptr<arrow::Array>*) /usr/include/c++/12/bits/stl_uninitialized.h:120 (arrow-table-test+0x1119cb)
    #5 std::shared_ptr<arrow::Array>* std::__uninitialized_copy<false>::__uninit_copy<__gnu_cxx::__normal_iterator<std::shared_ptr<arrow::Array> const*, std::vector<std::shared_ptr<arrow::Array>, std::allocator<std::shared_ptr<arrow::Array> > > >, std::shared_ptr<arrow::Array>*>(__gnu_cxx::__normal_iterator<std::shared_ptr<arrow::Array> const*, std::vector<std::shared_ptr<arrow::Array>, std::allocator<std::shared_ptr<arrow::Array> > > >, __gnu_cxx::__normal_iterator<std::shared_ptr<arrow::Array> const*, std::vector<std::shared_ptr<arrow::Array>, std::allocator<std::shared_ptr<arrow::Array> > > >, std::shared_ptr<arrow::Array>*) /usr/include/c++/12/bits/stl_uninitialized.h:137 (arrow-table-test+0x10896d)
    #6 std::shared_ptr<arrow::Array>* std::uninitialized_copy<__gnu_cxx::__normal_iterator<std::shared_ptr<arrow::Array> const*, std::vector<std::shared_ptr<arrow::Array>, std::allocator<std::shared_ptr<arrow::Array> > > >, std::shared_ptr<arrow::Array>*>(__gnu_cxx::__normal_iterator<std::shared_ptr<arrow::Array> const*, std::vector<std::shared_ptr<arrow::Array>, std::allocator<std::shared_ptr<arrow::Array> > > >, __gnu_cxx::__normal_iterator<std::shared_ptr<arrow::Array> const*, std::vector<std::shared_ptr<arrow::Array>, std::allocator<std::shared_ptr<arrow::Array> > > >, std::shared_ptr<arrow::Array>*) /usr/include/c++/12/bits/stl_uninitialized.h:185 (arrow-table-test+0x103b49)
    #7 std::shared_ptr<arrow::Array>* std::__uninitialized_copy_a<__gnu_cxx::__normal_iterator<std::shared_ptr<arrow::Array> const*, std::vector<std::shared_ptr<arrow::Array>, std::allocator<std::shared_ptr<arrow::Array> > > >, std::shared_ptr<arrow::Array>*, std::shared_ptr<arrow::Array> >(__gnu_cxx::__normal_iterator<std::shared_ptr<arrow::Array> const*, std::vector<std::shared_ptr<arrow::Array>, std::allocator<std::shared_ptr<arrow::Array> > > >, __gnu_cxx::__normal_iterator<std::shared_ptr<arrow::Array> const*, std::vector<std::shared_ptr<arrow::Array>, std::allocator<std::shared_ptr<arrow::Array> > > >, std::shared_ptr<arrow::Array>*, std::allocator<std::shared_ptr<arrow::Array> >&) /usr/include/c++/12/bits/stl_uninitialized.h:372 (arrow-table-test+0xfdd56)
    #8 std::vector<std::shared_ptr<arrow::Array>, std::allocator<std::shared_ptr<arrow::Array> > >::vector(std::vector<std::shared_ptr<arrow::Array>, std::allocator<std::shared_ptr<arrow::Array> > > const&) /usr/include/c++/12/bits/stl_vector.h:601 (arrow-table-test+0xf726c)
    #9 arrow::TestRecordBatch_ColumnsThreadSafety_Test::TestBody() /home/user/arrow/cpp/src/arrow/record_batch_test.cc:413 (arrow-table-test+0x18cf6a)
    #10 void testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) /home/user/arrow/cpp/out/build/ninja-debug-tsan/_deps/googletest-src/googletest/src/gtest.cc:2607 (libarrow_gtestd.so.1.11.0+0xd6672)

  Location is heap block of size 16 at 0x7b0400000db0 allocated by main thread:
    #0 operator new(unsigned long) ../../../../src/libsanitizer/tsan/tsan_new_delete.cpp:64 (libtsan.so.2+0x8d7d9)
    #1 std::__new_allocator<std::shared_ptr<arrow::Array> >::allocate(unsigned long, void const*) /usr/include/c++/12/bits/new_allocator.h:137 (arrow-table-test+0x110a98)
    #2 std::allocator_traits<std::allocator<std::shared_ptr<arrow::Array> > >::allocate(std::allocator<std::shared_ptr<arrow::Array> >&, unsigned long) /usr/include/c++/12/bits/alloc_traits.h:464 (arrow-table-test+0x1075ec)
    #3 std::_Vector_base<std::shared_ptr<arrow::Array>, std::allocator<std::shared_ptr<arrow::Array> > >::_M_allocate(unsigned long) /usr/include/c++/12/bits/stl_vector.h:378 (arrow-table-test+0x10130c)
    #4 std::vector<std::shared_ptr<arrow::Array>, std::allocator<std::shared_ptr<arrow::Array> > >::_M_default_append(unsigned long) /usr/include/c++/12/bits/vector.tcc:657 (libarrow.so.2000+0x19ddaeb)
    #5 std::vector<std::shared_ptr<arrow::Array>, std::allocator<std::shared_ptr<arrow::Array> > >::resize(unsigned long) /usr/include/c++/12/bits/stl_vector.h:1011 (libarrow.so.2000+0x19d7e0d)
    #6 arrow::SimpleRecordBatch::SimpleRecordBatch(std::shared_ptr<arrow::Schema> const&, long, std::vector<std::shared_ptr<arrow::ArrayData>, std::allocator<std::shared_ptr<arrow::ArrayData> > >, arrow::DeviceAllocationType, std::shared_ptr<arrow::Device::SyncEvent>) /home/user/arrow/cpp/src/arrow/record_batch.cc:91 (libarrow.so.2000+0x19d3a8e)
    #7 void std::_Construct<arrow::SimpleRecordBatch, std::shared_ptr<arrow::Schema>, long&, std::vector<std::shared_ptr<arrow::ArrayData>, std::allocator<std::shared_ptr<arrow::ArrayData> > >, arrow::DeviceAllocationType&, std::shared_ptr<arrow::Device::SyncEvent> >(arrow::SimpleRecordBatch*, std::shared_ptr<arrow::Schema>&&, long&, std::vector<std::shared_ptr<arrow::ArrayData>, std::allocator<std::shared_ptr<arrow::ArrayData> > >&&, arrow::DeviceAllocationType&, std::shared_ptr<arrow::Device::SyncEvent>&&) /usr/include/c++/12/bits/stl_construct.h:119 (libarrow.so.2000+0x19f2c75)
    #8 void std::allocator_traits<std::allocator<void> >::construct<arrow::SimpleRecordBatch, std::shared_ptr<arrow::Schema>, long&, std::vector<std::shared_ptr<arrow::ArrayData>, std::allocator<std::shared_ptr<arrow::ArrayData> > >, arrow::DeviceAllocationType&, std::shared_ptr<arrow::Device::SyncEvent> >(std::allocator<void>&, arrow::SimpleRecordBatch*, std::shared_ptr<arrow::Schema>&&, long&, std::vector<std::shared_ptr<arrow::ArrayData>, std::allocator<std::shared_ptr<arrow::ArrayData> > >&&, arrow::DeviceAllocationType&, std::shared_ptr<arrow::Device::SyncEvent>&&) /usr/include/c++/12/bits/alloc_traits.h:635 (libarrow.so.2000+0x19f0969)
    #9 std::_Sp_counted_ptr_inplace<arrow::SimpleRecordBatch, std::allocator<void>, (__gnu_cxx::_Lock_policy)2>::_Sp_counted_ptr_inplace<std::shared_ptr<arrow::Schema>, long&, std::vector<std::shared_ptr<arrow::ArrayData>, std::allocator<std::shared_ptr<arrow::ArrayData> > >, arrow::DeviceAllocationType&, std::shared_ptr<arrow::Device::SyncEvent> >(std::allocator<void>, std::shared_ptr<arrow::Schema>&&, long&, std::vector<std::shared_ptr<arrow::ArrayData>, std::allocator<std::shared_ptr<arrow::ArrayData> > >&&, arrow::DeviceAllocationType&, std::shared_ptr<arrow::Device::SyncEvent>&&) /usr/include/c++/12/bits/shared_ptr_base.h:604 (libarrow.so.2000+0x19ed7ca)
    #10 std::__shared_count<(__gnu_cxx::_Lock_policy)2>::__shared_count<arrow::SimpleRecordBatch, std::allocator<void>, std::shared_ptr<arrow::Schema>, long&, std::vector<std::shared_ptr<arrow::ArrayData>, std::allocator<std::shared_ptr<arrow::ArrayData> > >, arrow::DeviceAllocationType&, std::shared_ptr<arrow::Device::SyncEvent> >(arrow::SimpleRecordBatch*&, std::_Sp_alloc_shared_tag<std::allocator<void> >, std::shared_ptr<arrow::Schema>&&, long&, std::vector<std::shared_ptr<arrow::ArrayData>, std::allocator<std::shared_ptr<arrow::ArrayData> > >&&, arrow::DeviceAllocationType&, std::shared_ptr<arrow::Device::SyncEvent>&&) <null> (libarrow.so.2000+0x19e9414)
    #11 std::__shared_ptr<arrow::SimpleRecordBatch, (__gnu_cxx::_Lock_policy)2>::__shared_ptr<std::allocator<void>, std::shared_ptr<arrow::Schema>, long&, std::vector<std::shared_ptr<arrow::ArrayData>, std::allocator<std::shared_ptr<arrow::ArrayData> > >, arrow::DeviceAllocationType&, std::shared_ptr<arrow::Device::SyncEvent> >(std::_Sp_alloc_shared_tag<std::allocator<void> >, std::shared_ptr<arrow::Schema>&&, long&, std::vector<std::shared_ptr<arrow::ArrayData>, std::allocator<std::shared_ptr<arrow::ArrayData> > >&&, arrow::DeviceAllocationType&, std::shared_ptr<arrow::Device::SyncEvent>&&) <null> (libarrow.so.2000+0x19e504c)
    #12 std::shared_ptr<arrow::SimpleRecordBatch>::shared_ptr<std::allocator<void>, std::shared_ptr<arrow::Schema>, long&, std::vector<std::shared_ptr<arrow::ArrayData>, std::allocator<std::shared_ptr<arrow::ArrayData> > >, arrow::DeviceAllocationType&, std::shared_ptr<arrow::Device::SyncEvent> >(std::_Sp_alloc_shared_tag<std::allocator<void> >, std::shared_ptr<arrow::Schema>&&, long&, std::vector<std::shared_ptr<arrow::ArrayData>, std::allocator<std::shared_ptr<arrow::ArrayData> > >&&, arrow::DeviceAllocationType&, std::shared_ptr<arrow::Device::SyncEvent>&&) <null> (libarrow.so.2000+0x19dec89)
    #13 std::shared_ptr<std::enable_if<!std::is_array<arrow::SimpleRecordBatch>::value, arrow::SimpleRecordBatch>::type> std::make_shared<arrow::SimpleRecordBatch, std::shared_ptr<arrow::Schema>, long&, std::vector<std::shared_ptr<arrow::ArrayData>, std::allocator<std::shared_ptr<arrow::ArrayData> > >, arrow::DeviceAllocationType&, std::shared_ptr<arrow::Device::SyncEvent> >(std::shared_ptr<arrow::Schema>&&, long&, std::vector<std::shared_ptr<arrow::ArrayData>, std::allocator<std::shared_ptr<arrow::ArrayData> > >&&, arrow::DeviceAllocationType&, std::shared_ptr<arrow::Device::SyncEvent>&&) /usr/include/c++/12/bits/shared_ptr.h:1010 (libarrow.so.2000+0x19d90e8)
    #14 arrow::RecordBatch::Make(std::shared_ptr<arrow::Schema>, long, std::vector<std::shared_ptr<arrow::ArrayData>, std::allocator<std::shared_ptr<arrow::ArrayData> > >, arrow::DeviceAllocationType, std::shared_ptr<arrow::Device::SyncEvent>) /home/user/arrow/cpp/src/arrow/record_batch.cc:230 (libarrow.so.2000+0x19c6223)
    #15 arrow::TestRecordBatch_ColumnsThreadSafety_Test::TestBody() /home/user/arrow/cpp/src/arrow/record_batch_test.cc:403 (arrow-table-test+0x18ce84)
    #16 void testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) /home/user/arrow/cpp/out/build/ninja-debug-tsan/_deps/googletest-src/googletest/src/gtest.cc:2607 (libarrow_gtestd.so.1.11.0+0xd6672)

  Mutex M811 (0x7fffeebff080) created at:
    #0 pthread_mutex_lock ../../../../src/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:4324 (libtsan.so.2+0x59bbf)
    #1 std::_Sp_locker::_Sp_locker(void const*) <null> (libstdc++.so.6+0xdb89c)
    #2 std::shared_ptr<arrow::Array> std::atomic_load<arrow::Array>(std::shared_ptr<arrow::Array> const*) /usr/include/c++/12/bits/shared_ptr_atomic.h:138 (libarrow.so.2000+0x19d7ebe)
    #3 arrow::SimpleRecordBatch::column(int) const /home/user/arrow/cpp/src/arrow/record_batch.cc:103 (libarrow.so.2000+0x19d3c20)
    #4 arrow::RecordBatch::Equals(arrow::RecordBatch const&, bool, arrow::EqualOptions const&) const /home/user/arrow/cpp/src/arrow/record_batch.cc:320 (libarrow.so.2000+0x19c75a7)
    #5 arrow::TestRecordBatch_EqualOptions_Test::TestBody() /home/user/arrow/cpp/src/arrow/record_batch_test.cc:105 (arrow-table-test+0x17d808)
    #6 void testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) /home/user/arrow/cpp/out/build/ninja-debug-tsan/_deps/googletest-src/googletest/src/gtest.cc:2607 (libarrow_gtestd.so.1.11.0+0xd6672)

  Thread T1 (tid=21339, running) created by main thread at:
    #0 pthread_create ../../../../src/libsanitizer/tsan/tsan_interceptors_posix.cpp:1001 (libtsan.so.2+0x63a59)
    #1 std::thread::_M_start_thread(std::unique_ptr<std::thread::_State, std::default_delete<std::thread::_State> >, void (*)()) <null> (libstdc++.so.6+0xdc328)
    #2 arrow::TestRecordBatch_ColumnsThreadSafety_Test::TestBody() /home/user/arrow/cpp/src/arrow/record_batch_test.cc:409 (arrow-table-test+0x18cf11)
    #3 void testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) /home/user/arrow/cpp/out/build/ninja-debug-tsan/_deps/googletest-src/googletest/src/gtest.cc:2607 (libarrow_gtestd.so.1.11.0+0xd6672)

SUMMARY: ThreadSanitizer: data race /usr/include/c++/12/bits/shared_ptr_base.h:1101 in std::__shared_count<(__gnu_cxx::_Lock_policy)2>::_M_swap(std::__shared_count<(__gnu_cxx::_Lock_policy)2>&)
==================
[       OK ] TestRecordBatch.ColumnsThreadSafety (295 ms)

Running the test again with the proposed fix shows no data race. The ASSERT_EQ(columns.size(), 1) is just there to make sure the columns variable isn't optimized out.

Not sure if it makes sense to have a test that only works under TSAN, but I don't think there is any way to surface the bug consistently without tooling.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants