-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-32276: [C++][FlightRPC] Align RecordBatch buffers given to IPC #44279
base: main
Are you sure you want to change the base?
Conversation
@pitrou do you think this fix is viable? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for picking this up. This seems reasonable to me at a quick glance.
cpp/src/arrow/array/data.h
Outdated
@@ -23,6 +23,7 @@ | |||
#include <memory> | |||
#include <utility> | |||
#include <vector> | |||
#include <arrow/util/range.h> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: put this include with the rest of the Arrow includes (and use quotes to be consistent)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
cpp/src/arrow/array/data.h
Outdated
} | ||
} | ||
// align children data recursively | ||
for (unsigned int i=0; i<child_data.size(); i++) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: you could iterate with for (auto& child : child_data)
and avoid the explicit index?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
much better!
python/pyarrow/tests/test_ipc.py
Outdated
@@ -548,11 +548,16 @@ def test_read_options(): | |||
options = pa.ipc.IpcReadOptions() | |||
assert options.use_threads is True | |||
assert options.ensure_native_endian is True | |||
assert options.ensure_memory_alignment is True | |||
assert options.ens is True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where did this come from?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
77cc70a
to
a5d9e2d
Compare
While attempting to write some unit tests I found there is arrow/cpp/src/arrow/util/align_util.cc Lines 169 to 205 in e62fbaa
I will try to reuse that method rather than re-implementing it. There is also test infrastructure for misaligned array data. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you rebase? Tests appear to be failing
cpp/src/arrow/ipc/reader.cc
Outdated
auto batch = RecordBatch::Make(std::move(filtered_schema), metadata->length(), | ||
std::move(filtered_columns)); | ||
if (context.options.ensure_memory_alignment) { | ||
return util::EnsureAlignment(batch, arrow::util::kValueAlignment, default_memory_pool()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use the memory pool in context.options.memory_pool?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure! Ideally we should use the buffer's memory manager rather than the default CPU manager:
arrow/cpp/src/arrow/memory_pool.cc
Lines 907 to 916 in 5ad0b3e
static std::unique_ptr<PoolBuffer> MakeUnique(MemoryPool* pool, int64_t alignment) { | |
std::shared_ptr<MemoryManager> mm; | |
if (pool == nullptr) { | |
pool = default_memory_pool(); | |
mm = default_cpu_memory_manager(); | |
} else { | |
mm = CPUDevice::memory_manager(pool); | |
} | |
return std::make_unique<PoolBuffer>(std::move(mm), pool, alignment); | |
} |
960cb21
to
9909f13
Compare
Test arrow/cpp/src/arrow/util/align_util.cc Lines 44 to 52 in bcb4653
https://github.com/apache/arrow/actions/runs/11462607112/job/31894398411?pr=44279#step:13:1548 That test complains a lot about arrow/cpp/src/arrow/util/align_util.cc Lines 56 to 76 in bcb4653
Looks like |
f2dae5b
to
d1219d2
Compare
@westonpace @sanjibansg |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is reasonable and we just need to update EnsureAlignment to cover DICTIONARY
cc4facf
to
a1ad4da
Compare
Problem was that tests define an This means that user code that defines |
I'm not sure I follow. An extension type should be treated the same as its storage type. I think the problem with the current code is that it blindly casts the array type to the storage type, not accounting for the fact that the point of the storage type is to allow the extension type. |
Any non-dictionary type (including extension types) therefore should be covered with Than that fix should be sound and safe. arrow/cpp/src/arrow/util/align_util.cc Lines 47 to 59 in 7a01029
|
eb25914
to
7a01029
Compare
7a01029
to
394e1ae
Compare
Rationale for this change
Data retrieved via IPC is expected to provide memory-aligned arrays, but data retrieved via C++ Flight client is mis-aligned. Datafusion (Rust), which requires proper alignment, cannot handle such data: #43552.
What changes are included in this PR?
This aligns RecordBatch array buffers decoded by IPC if mis-aligned according to the data type byte width.
Implementation mirrors that of
align_buffers
in arrow-rs (apache/arrow-rs#4681).Are these changes tested?
Configuration flag tested in unit test.
Manually end-to-end tested that memory alignment fixes issue with reproduction code provided in #43552.
Are there any user-facing changes?
Memory alignment is checked and fixed by default. This is configurable via
IpcReadOptions.ensure_memory_alignment
.