Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve default implementation of Array::is_nullable #6721

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

findepi
Copy link
Member

@findepi findepi commented Nov 12, 2024

Description

Improve default implementation of Array::is_nullable

Since is_nullable returns false if the array is guaranteed to not contain any logical nulls, the correct default implementation could leverage self.logical_null_count more appropriately than self.null_count.

To reduce chance of negative surprises in downstream projects, we could introduce the new behavior under is_logically_nullable name and deprecate is_nullable without changing it.

Which issue does this PR close?

None

Rationale for this change

Fix technically incorrect default implementation of Array::is_nullable

What changes are included in this PR?

Change default implementation of Array::is_nullable.

Are there any user-facing changes?

No

@github-actions github-actions bot added the arrow Changes to the arrow crate label Nov 12, 2024
@@ -315,8 +315,7 @@ pub trait Array: std::fmt::Debug + Send + Sync {
/// even if the nulls present in [`DictionaryArray::values`] are not referenced by any key,
/// and therefore would not appear in [`Array::logical_nulls`].
fn is_nullable(&self) -> bool {
// TODO this is not necessarily perfect default implementation, since null_count() and logical_null_count() are not always equivalent
self.null_count() != 0
self.logical_null_count() != 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we provide alternative implementations DictionaryArray and UnionArray as part of this, currently this will significantly regress performance for them

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the logical_null_count implementation should obviously be consistent with logical_nulls implementation, so let's do #6740 first

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i added explicit implementation of DictionaryArray::logical_null_count
i am not sure what to do about UnionArray. UnionArray::logical_nulls has non trivial logic. Can you please advise how this should be approached the best way? should I just copy that function?

cc @alamb

@tustvold
Copy link
Contributor

We should also update the doc comment for this as well, as currently it is highlighted as pessimisable

@findepi findepi force-pushed the findepi/improve-default-implementation-of-array-is-nullable-7e779e branch from 2d55feb to f8ed2cc Compare November 16, 2024 21:55
Since is_nullable returns `false` if the array is guaranteed to not
contain any logical nulls, the correct default implementation could
leverage `self.logical_null_count` more appropriately than
`self.null_count`.

To reduce chance of negative surprises in downstream projects, we could
introduce the new behavior under `is_logically_nullable` name and
deprecate `is_nullable` without changing it.
@findepi findepi force-pushed the findepi/improve-default-implementation-of-array-is-nullable-7e779e branch from f8ed2cc to 9e6fbf8 Compare November 19, 2024 08:50
@findepi
Copy link
Member Author

findepi commented Nov 19, 2024

thank you @tustvold for merging #6740.
i rebased this one. Looking forward to further advice.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrow Changes to the arrow crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants