Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: add custom (de)serialization methods for special float value #16258

Merged
merged 12 commits into from
Sep 9, 2024

Conversation

dqhl76
Copy link
Collaborator

@dqhl76 dqhl76 commented Aug 16, 2024

I hereby agree to the terms of the CLA available at: https://docs.databend.com/dev/policies/cla/

Summary

fixes: #16213

  • Implemented custom serialization and deserialization for F32 and F64 types in NumberScalar to handle infinite/Nan float values.

Tests

  • Unit Test
  • Logic Test
  • Benchmark Test
  • No Test - Explain why

Type of change

  • Bug Fix (non-breaking change which fixes an issue)
  • New Feature (non-breaking change which adds functionality)
  • Breaking Change (fix or feature that could cause existing functionality not to work as expected)
  • Documentation Update
  • Refactoring
  • Performance Improvement
  • Other (please describe):

This change is Reviewable

@github-actions github-actions bot added the pr-bugfix this PR patches a bug in codebase label Aug 16, 2024
@dqhl76 dqhl76 marked this pull request as ready for review August 16, 2024 09:53
@dqhl76 dqhl76 requested review from sundy-li and andylokandy and removed request for sundy-li August 16, 2024 09:53
@andylokandy
Copy link
Collaborator

Thank you!

@andylokandy
Copy link
Collaborator

andylokandy commented Aug 16, 2024

@sundy-li Is supporting infinite for all NumberScalar a good idea? As there is no standard for inf serialization, I'd concern about the forward compatibility because NumberScalar could be persist in meta or settings.

On the other aspect, this is an user interface change. We used to / planned to support 'inf'::float64 as the recommended way to construct a inf float. If we accept this PR, it'll be implying that we will support auto cast from string to numeric types, which is currently denied.

@dqhl76
Copy link
Collaborator Author

dqhl76 commented Aug 16, 2024

I'd concern about the forward compatibility because NumberScalar could be persist in meta or settings.

I add a condition serializer.is_human_readable() to handle the inf case. I find seems we use bincode and msgpack in meta related(they are not human readable but json is). It could be not affect? Not sure about that 0.0

@sundy-li
Copy link
Member

I find seems we use bincode and msgpack in meta related(they are not human readable but json is). It could be not affect? Not sure about that 0.0

Yes.

As there is no standard for inf serialization, I'd concern about the forward compatibility because NumberScalar could be persist in meta or settings.

@andylokandy If infor nan is already serialized into meta or settings via serde_json, we can't deserialize it back, it could throw errors, so this pr can solve the problem.

Now we can also cast string into float64 (but it's not auto-cast)

🐳 :) select '+inf'::float64, '-inf'::float64, 'nan'::float64, 'Infinity'::float64, '-Infinity'::float64, 'Nan'::float64;
┌──────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ '+inf'::float64 │ '-inf'::float64 │ 'nan'::float64 │ 'infinity'::float64 │ '-infinity'::float64 │ 'nan'::float64 │
│     Float64     │     Float64     │     Float64    │       Float64       │        Float64       │     Float64    │
├─────────────────┼─────────────────┼────────────────┼─────────────────────┼──────────────────────┼────────────────┤
│             inf │            -inf │            NaN │                 inf │                 -inf │            NaN │
└──────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
1 row read in 0.048 sec. Processed 1 row, 1B (21.05 rows/s, 21B/s)

@dqhl76 dqhl76 marked this pull request as draft August 19, 2024 02:21
@dqhl76
Copy link
Collaborator Author

dqhl76 commented Sep 3, 2024

Sorry for some time leave.

This PR was put on hold because overriding the (de)serialization methods for F32 and F64 still doesn't address the issue. There are parts of the code, such as ValueType::Scalar, that use borsh-serde but are unaffected by this PR's override methods.
img_v3_02dt_20858391-51ef-4763-9a71-7b2b6fbbf30g

To work around this problem, I patched borsh-rs to remove the NaN restriction: borsh-rs NaN bypass patch. It's unlikely that the upstream will accept these changes due to potential cross-platform issues with NaN, but we don't encounter this problem in our case.

img_v3_02dt_1164255e-c0e3-40cd-870b-a94abb0f65cg

@sundy-li
Copy link
Member

sundy-li commented Sep 3, 2024

It's unlikely that the upstream will accept these changes due to potential cross-platform issues with NaN, but we don't encounter this problem in our case.

You can make it a feature gate that defaults to be disabled. Send a pr to upstream, let the maintainer decide whether to accept it.

@dqhl76
Copy link
Collaborator Author

dqhl76 commented Sep 3, 2024

It's unlikely that the upstream will accept these changes due to potential cross-platform issues with NaN, but we don't encounter this problem in our case.

You can make it a feature gate that defaults to be disabled. Send a pr to upstream, let the maintainer decide whether to accept it.

Ok, I will send a PR to upstream later

@dqhl76
Copy link
Collaborator Author

dqhl76 commented Sep 4, 2024

You can make it a feature gate that defaults to be disabled. Send a pr to upstream, let the maintainer decide whether to accept it.

I opened a PR yesterday. Maintainers have some concerns about that. near/borsh-rs#308

@sundy-li
Copy link
Member

sundy-li commented Sep 4, 2024

LGTM, maybe we can do better.

#[derive(Default, Clone, Copy)]
#[repr(transparent)]
struct F32(OrderedFloat<f32>);
struct F64(OrderedFloat<f64>);

then let's implement serde, ord, eq for the new struct

And AsReffrom(f32), ... for it.

@dqhl76 dqhl76 marked this pull request as ready for review September 9, 2024 08:05
@dosubot dosubot bot added size:XXL This PR changes 1000+ lines, ignoring generated files. A-query Area: databend query C-bug Category: something isn't working labels Sep 9, 2024
@dqhl76
Copy link
Collaborator Author

dqhl76 commented Sep 9, 2024

Hi @sundy-li, Could you please take a look again? This commit is what I actually edited.

@dqhl76 dqhl76 changed the title fix: add custom (de)serialization for infinite float values fix: add custom (de)serialization methods for special float value Sep 9, 2024
@andylokandy
Copy link
Collaborator

I'm concerned about moving the whole lib into databend. As @sundy-li said, we can impl the custom serde logic for F32 and F64.

Even if we decide to vendor the crate, it's better to give it its own crate rather than moving into databend-common-base.

@dqhl76
Copy link
Collaborator Author

dqhl76 commented Sep 9, 2024

Thanks for your review.

I'm concerned about moving the whole lib into databend. As @sundy-li said, we can impl the custom serde logic for F32 and F64.

If we implement these traits (like serde, Ord, Eq, AsRef, From(f32)) ourselves, we’re essentially just duplicating what the ordered-float library already does. In that case, we might vendor the lib directly.

Even if we decide to vendor the crate, it's better to give it its own crate rather than moving into databend-common-base.

It sounds good to me. I didn't put it into a crate becuase I thought it is a one file crate.

@dosubot dosubot bot added the lgtm This PR has been approved by a maintainer label Sep 9, 2024
@sundy-li sundy-li added this pull request to the merge queue Sep 9, 2024
@BohuTANG BohuTANG removed this pull request from the merge queue due to a manual request Sep 9, 2024
@BohuTANG BohuTANG merged commit 20c6964 into databendlabs:main Sep 9, 2024
108 checks passed
@dqhl76 dqhl76 deleted the fix-float-null branch September 14, 2024 05:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-query Area: databend query C-bug Category: something isn't working lgtm This PR has been approved by a maintainer pr-bugfix this PR patches a bug in codebase size:XXL This PR changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

bug: Insert and select Double INF value cause error
4 participants