fix: add custom (de)serialization methods for special float value #16258

dqhl76 · 2024-08-16T01:26:25Z

I hereby agree to the terms of the CLA available at: https://docs.databend.com/dev/policies/cla/

Summary

fixes: #16213

Implemented custom serialization and deserialization for F32 and F64 types in NumberScalar to handle infinite/Nan float values.

Tests

Unit Test
Logic Test
Benchmark Test
No Test - Explain why

Type of change

Bug Fix (non-breaking change which fixes an issue)
New Feature (non-breaking change which adds functionality)
Breaking Change (fix or feature that could cause existing functionality not to work as expected)
Documentation Update
Refactoring
Performance Improvement
Other (please describe):

This change is

andylokandy · 2024-08-16T09:58:48Z

Thank you!

tests/sqllogictests/suites/base/issues/issue_16213.test

andylokandy · 2024-08-16T10:11:47Z

@sundy-li Is supporting infinite for all NumberScalar a good idea? As there is no standard for inf serialization, I'd concern about the forward compatibility because NumberScalar could be persist in meta or settings.

On the other aspect, this is an user interface change. We used to / planned to support 'inf'::float64 as the recommended way to construct a inf float. If we accept this PR, it'll be implying that we will support auto cast from string to numeric types, which is currently denied.

dqhl76 · 2024-08-16T10:18:18Z

I'd concern about the forward compatibility because NumberScalar could be persist in meta or settings.

I add a condition serializer.is_human_readable() to handle the inf case. I find seems we use bincode and msgpack in meta related(they are not human readable but json is). It could be not affect? Not sure about that 0.0

sundy-li · 2024-08-16T12:49:35Z

I find seems we use bincode and msgpack in meta related(they are not human readable but json is). It could be not affect? Not sure about that 0.0

Yes.

As there is no standard for inf serialization, I'd concern about the forward compatibility because NumberScalar could be persist in meta or settings.

@andylokandy If infor nan is already serialized into meta or settings via serde_json, we can't deserialize it back, it could throw errors, so this pr can solve the problem.

Now we can also cast string into float64 (but it's not auto-cast)

🐳 :) select '+inf'::float64, '-inf'::float64, 'nan'::float64, 'Infinity'::float64, '-Infinity'::float64, 'Nan'::float64;
┌──────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ '+inf'::float64 │ '-inf'::float64 │ 'nan'::float64 │ 'infinity'::float64 │ '-infinity'::float64 │ 'nan'::float64 │
│     Float64     │     Float64     │     Float64    │       Float64       │        Float64       │     Float64    │
├─────────────────┼─────────────────┼────────────────┼─────────────────────┼──────────────────────┼────────────────┤
│             inf │            -inf │            NaN │                 inf │                 -inf │            NaN │
└──────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
1 row read in 0.048 sec. Processed 1 row, 1B (21.05 rows/s, 21B/s)

tests/sqllogictests/suites/base/issues/issue_16213.test

dqhl76 · 2024-09-03T06:11:10Z

Sorry for some time leave.

This PR was put on hold because overriding the (de)serialization methods for F32 and F64 still doesn't address the issue. There are parts of the code, such as ValueType::Scalar, that use borsh-serde but are unaffected by this PR's override methods.

To work around this problem, I patched borsh-rs to remove the NaN restriction: borsh-rs NaN bypass patch. It's unlikely that the upstream will accept these changes due to potential cross-platform issues with NaN, but we don't encounter this problem in our case.

sundy-li · 2024-09-03T06:35:20Z

It's unlikely that the upstream will accept these changes due to potential cross-platform issues with NaN, but we don't encounter this problem in our case.

You can make it a feature gate that defaults to be disabled. Send a pr to upstream, let the maintainer decide whether to accept it.

dqhl76 · 2024-09-03T06:36:22Z

It's unlikely that the upstream will accept these changes due to potential cross-platform issues with NaN, but we don't encounter this problem in our case.

You can make it a feature gate that defaults to be disabled. Send a pr to upstream, let the maintainer decide whether to accept it.

Ok, I will send a PR to upstream later

dqhl76 · 2024-09-04T02:46:38Z

You can make it a feature gate that defaults to be disabled. Send a pr to upstream, let the maintainer decide whether to accept it.

I opened a PR yesterday. Maintainers have some concerns about that. near/borsh-rs#308

sundy-li · 2024-09-04T07:38:11Z

LGTM, maybe we can do better.

#[derive(Default, Clone, Copy)]
#[repr(transparent)]
struct F32(OrderedFloat<f32>);
struct F64(OrderedFloat<f64>);

then let's implement serde， ord， eq for the new struct

And AsRef ， from(f32), ... for it.

…ow use NaN and inf

dqhl76 · 2024-09-09T08:06:17Z

Hi @sundy-li, Could you please take a look again? This commit is what I actually edited.

andylokandy · 2024-09-09T09:40:27Z

I'm concerned about moving the whole lib into databend. As @sundy-li said, we can impl the custom serde logic for F32 and F64.

Even if we decide to vendor the crate, it's better to give it its own crate rather than moving into databend-common-base.

dqhl76 · 2024-09-09T10:27:39Z

Thanks for your review.

I'm concerned about moving the whole lib into databend. As @sundy-li said, we can impl the custom serde logic for F32 and F64.

If we implement these traits (like serde, Ord, Eq, AsRef, From(f32)) ourselves, we’re essentially just duplicating what the ordered-float library already does. In that case, we might vendor the lib directly.

Even if we decide to vendor the crate, it's better to give it its own crate rather than moving into databend-common-base.

It sounds good to me. I didn't put it into a crate becuase I thought it is a one file crate.

fix: add custom (de)serialization for infinite float values

b37ab48

github-actions bot added the pr-bugfix this PR patches a bug in codebase label Aug 16, 2024

fix: fix serialize and deserialize error for infinity float case

a65e4f6

dqhl76 marked this pull request as ready for review August 16, 2024 09:53

dqhl76 requested review from sundy-li and andylokandy and removed request for sundy-li August 16, 2024 09:53

sundy-li reviewed Aug 16, 2024

View reviewed changes

tests/sqllogictests/suites/base/issues/issue_16213.test Show resolved Hide resolved

sundy-li approved these changes Aug 19, 2024

View reviewed changes

andylokandy reviewed Aug 19, 2024

View reviewed changes

tests/sqllogictests/suites/base/issues/issue_16213.test Outdated Show resolved Hide resolved

test: add more tests for inf and nan

b3dce37

dqhl76 marked this pull request as draft August 19, 2024 02:21

fix

6cf9ded

andylokandy reviewed Aug 19, 2024

View reviewed changes

tests/sqllogictests/suites/base/issues/issue_16213.test Show resolved Hide resolved

dqhl76 added 2 commits August 19, 2024 13:39

fix: add borsh f32 and f64 replaced method for nan

73c14ad

chore: patch borsh-rs to bypass NaN check

9893150

dqhl76 force-pushed the fix-float-null branch from d9bca1c to 9893150 Compare September 3, 2024 06:09

make taplo happy

43868dd

dqhl76 added 2 commits September 8, 2024 22:53

refactor: put ordered float file in base

52aa4d9

fix: override ordered_float's json and borsh (de)serde methods to all…

99c6fc3

…ow use NaN and inf

dqhl76 added 3 commits September 9, 2024 00:06

taplo fmt

22e9524

Merge branch 'main' into fix-float-null

469da63

test: add more test

1fc42a6

dqhl76 marked this pull request as ready for review September 9, 2024 08:05

dosubot bot added size:XXL This PR changes 1000+ lines, ignoring generated files. A-query Area: databend query C-bug Category: something isn't working labels Sep 9, 2024

dqhl76 changed the title ~~fix: add custom (de)serialization for infinite float values~~ fix: add custom (de)serialization methods for special float value Sep 9, 2024

sundy-li approved these changes Sep 9, 2024

View reviewed changes

dosubot bot added the lgtm This PR has been approved by a maintainer label Sep 9, 2024

sundy-li added this pull request to the merge queue Sep 9, 2024

BohuTANG removed this pull request from the merge queue due to a manual request Sep 9, 2024

BohuTANG merged commit 20c6964 into databendlabs:main Sep 9, 2024
108 checks passed

dqhl76 deleted the fix-float-null branch September 14, 2024 05:37

andylokandy mentioned this pull request Oct 15, 2024

feat: implement StringColumn using StringViewArray #16610

Merged

11 tasks

andylokandy mentioned this pull request Oct 25, 2024

refactor: refine cast variant to map #16691

Merged

11 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: add custom (de)serialization methods for special float value #16258

fix: add custom (de)serialization methods for special float value #16258

dqhl76 commented Aug 16, 2024 •

edited

Loading

andylokandy commented Aug 16, 2024

andylokandy commented Aug 16, 2024 •

edited

Loading

dqhl76 commented Aug 16, 2024 •

edited

Loading

sundy-li commented Aug 16, 2024

dqhl76 commented Sep 3, 2024

sundy-li commented Sep 3, 2024 •

edited

Loading

dqhl76 commented Sep 3, 2024

dqhl76 commented Sep 4, 2024

sundy-li commented Sep 4, 2024

dqhl76 commented Sep 9, 2024 •

edited

Loading

andylokandy commented Sep 9, 2024

dqhl76 commented Sep 9, 2024 •

edited

Loading

fix: add custom (de)serialization methods for special float value #16258

fix: add custom (de)serialization methods for special float value #16258

Conversation

dqhl76 commented Aug 16, 2024 • edited Loading

Summary

Tests

Type of change

andylokandy commented Aug 16, 2024

andylokandy commented Aug 16, 2024 • edited Loading

dqhl76 commented Aug 16, 2024 • edited Loading

sundy-li commented Aug 16, 2024

dqhl76 commented Sep 3, 2024

sundy-li commented Sep 3, 2024 • edited Loading

dqhl76 commented Sep 3, 2024

dqhl76 commented Sep 4, 2024

sundy-li commented Sep 4, 2024

dqhl76 commented Sep 9, 2024 • edited Loading

andylokandy commented Sep 9, 2024

dqhl76 commented Sep 9, 2024 • edited Loading

dqhl76 commented Aug 16, 2024 •

edited

Loading

andylokandy commented Aug 16, 2024 •

edited

Loading

dqhl76 commented Aug 16, 2024 •

edited

Loading

sundy-li commented Sep 3, 2024 •

edited

Loading

dqhl76 commented Sep 9, 2024 •

edited

Loading

dqhl76 commented Sep 9, 2024 •

edited

Loading