-
Notifications
You must be signed in to change notification settings - Fork 176
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: Fix arrow error when sorting on empty batch #271
Conversation
core/Cargo.toml
Outdated
datafusion = { default-features = false, git = "https://github.com/viirya/arrow-datafusion.git", rev = "111a940", features = ["unicode_expressions"] } | ||
datafusion-functions = { git = "https://github.com/viirya/arrow-datafusion.git", rev = "111a940" } | ||
datafusion-physical-expr = { git = "https://github.com/viirya/arrow-datafusion.git", rev = "111a940", default-features = false, features = ["unicode_expressions"] } | ||
datafusion-common = { git = "https://github.com/viirya/arrow-datafusion.git", rev = "83d5782" } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This points to the upstream fix: apache/datafusion#10094
datafusion = { default-features = false, git = "https://github.com/viirya/arrow-datafusion.git", rev = "111a940", features = ["unicode_expressions"] } | ||
datafusion-functions = { git = "https://github.com/viirya/arrow-datafusion.git", rev = "111a940" } | ||
datafusion-physical-expr = { git = "https://github.com/viirya/arrow-datafusion.git", rev = "111a940", default-features = false, features = ["unicode_expressions"] } | ||
datafusion-common = { git = "https://github.com/viirya/arrow-datafusion.git", rev = "57b3be4" } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, should we wait until the DF PR is merged and use https://github.com/apache/arrow-datafusion.git
instead of your personal fork?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is pointed to my fork of DataFusion not just for this PR. We uses the fork of arrow-rs to use a workaround for the Java Arrow bug. So we must use the fork too in our DataFusion crate, otherwise rust compiler will complain duplicated structs etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. I missed #239. It is OK as long as we don't stay in this state for too long 😂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, we will change back to official arrow-rs and DataFusion once we get new Java Arrow release.
I will hold this until the DataFusion patch is merged. Thanks. |
RecordBatch::try_new(self.schema.clone(), vectors).map_err(|e| arrow_datafusion_err!(e)) | ||
|
||
let options = RecordBatchOptions::new().with_row_count(Some(batch.num_rows())); | ||
RecordBatch::try_new_with_options(self.schema.clone(), vectors, &options) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should also check is there any other invocation of RecordBatch::try_new
in the codebase and replace it with RecordBatch::try_new_with_options
if necessary.
I just did a code search, maybe we probably need to revise the code in Expand.rs too: https://github.com/apache/arrow-datafusion-comet/blob/main/core/src/execution/datafusion/operators/expand.rs#L172
Of course, it should be done in a follow up PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. Although I'm not sure if it is possible to have Expand
with empty projections, it is good to have it here too. I will add it in this PR as it is a simple change.
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #271 +/- ##
============================================
- Coverage 33.47% 33.46% -0.01%
Complexity 795 795
============================================
Files 110 110
Lines 37533 37533
Branches 8215 8215
============================================
- Hits 12563 12562 -1
- Misses 22322 22323 +1
Partials 2648 2648 ☔ View full report in Codecov by Sentry. |
Merged. Thanks. |
* fix: Fix arrow error when sorting on empty batch * Add RecordBatchOptions for Expand
Which issue does this PR close?
Closes #270.
Rationale for this change
What changes are included in this PR?
How are these changes tested?