Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add binary string operations (length and concatenation) #3646

Open
wants to merge 32 commits into
base: main
Choose a base branch
from

Conversation

f4t4nt
Copy link

@f4t4nt f4t4nt commented Jan 7, 2025

Added two new binary string operations to Daft:\n\n- binary.length(): Get length of binary strings in bytes\n- binary.concat(): Concatenate two binary strings\n\nImplemented in Rust with full test coverage including ASCII, UTF-8, and special binary sequences.

@f4t4nt f4t4nt changed the title Add Binary String Operations: Length and Concatenation feat: add binary string operations (length and concatenation) Jan 7, 2025
@github-actions github-actions bot added the feat label Jan 7, 2025
@kevinzwang kevinzwang self-requested a review January 7, 2025 18:58
@f4t4nt f4t4nt force-pushed the nishant-binary-string branch from 18b8ab2 to 3e0dae6 Compare January 7, 2025 19:01
Copy link

codspeed-hq bot commented Jan 7, 2025

CodSpeed Performance Report

Merging #3646 will improve performances by 34.3%

Comparing nishant-binary-string (c887532) with main (6157ad8)

Summary

⚡ 1 improvements
✅ 26 untouched benchmarks

Benchmarks breakdown

Benchmark main nishant-binary-string Change
test_iter_rows_first_row[100 Small Files] 192.6 ms 143.4 ms +34.3%

Copy link
Member

@kevinzwang kevinzwang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great so far!

src/daft-core/src/array/ops/binary.rs Outdated Show resolved Hide resolved
src/daft-core/src/array/ops/binary.rs Outdated Show resolved Hide resolved
src/daft-core/src/array/ops/binary.rs Show resolved Hide resolved
Copy link

codecov bot commented Jan 7, 2025

Codecov Report

Attention: Patch coverage is 84.63303% with 67 lines in your changes missing coverage. Please review.

Project coverage is 76.87%. Comparing base (39bb62c) to head (c887532).
Report is 22 commits behind head on main.

Files with missing lines Patch % Lines
src/daft-core/src/series/ops/binary.rs 33.33% 26 Missing ⚠️
src/daft-core/src/array/ops/binary.rs 94.91% 12 Missing ⚠️
src/daft-functions/src/binary/concat.rs 84.21% 12 Missing ⚠️
src/daft-functions/src/binary/slice.rs 71.05% 11 Missing ⚠️
src/daft-functions/src/binary/length.rs 80.64% 6 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #3646      +/-   ##
==========================================
- Coverage   77.99%   76.87%   -1.13%     
==========================================
  Files         720      733      +13     
  Lines       88794    93056    +4262     
==========================================
+ Hits        69252    71533    +2281     
- Misses      19542    21523    +1981     
Files with missing lines Coverage Δ
daft/expressions/expressions.py 93.62% <100.00%> (+0.10%) ⬆️
src/daft-core/src/series/ops/mod.rs 100.00% <ø> (ø)
src/daft-functions/src/lib.rs 0.00% <ø> (ø)
src/daft-functions/src/python/mod.rs 100.00% <100.00%> (ø)
src/daft-functions/src/binary/length.rs 80.64% <80.64%> (ø)
src/daft-functions/src/binary/slice.rs 71.05% <71.05%> (ø)
src/daft-core/src/array/ops/binary.rs 94.91% <94.91%> (ø)
src/daft-functions/src/binary/concat.rs 84.21% <84.21%> (ø)
src/daft-core/src/series/ops/binary.rs 33.33% <33.33%> (ø)

... and 300 files with indirect coverage changes

@f4t4nt f4t4nt force-pushed the nishant-binary-string branch from 6338308 to cf73ba2 Compare January 7, 2025 19:52
@f4t4nt f4t4nt force-pushed the nishant-binary-string branch from 2deddac to 1c26a63 Compare January 8, 2025 21:54
@f4t4nt f4t4nt requested a review from kevinzwang January 9, 2025 01:01
Copy link
Member

@kevinzwang kevinzwang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good so far! In addition to the other comments there's two things you should also do:

  • add these functions to our docs by adding a section in docs/source/api_docs/expressions.rst
  • implement this for the FixedSizeBinaryArray type as well

src/daft-core/src/series/ops/binary.rs Show resolved Hide resolved
src/daft-functions/src/python/binary.rs Outdated Show resolved Hide resolved
tests/table/utf8/test_concat.py Outdated Show resolved Hide resolved
@f4t4nt f4t4nt requested a review from kevinzwang January 14, 2025 18:28
Copy link
Member

@kevinzwang kevinzwang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good so far! I had some comments on things to clean up.

daft/expressions/expressions.py Outdated Show resolved Hide resolved
pyproject.toml Show resolved Hide resolved
src/daft-core/src/array/ops/binary.rs Outdated Show resolved Hide resolved
src/daft-core/src/array/ops/binary.rs Outdated Show resolved Hide resolved
src/daft-core/src/array/ops/binary.rs Outdated Show resolved Hide resolved
src/daft-functions/src/binary/concat.rs Show resolved Hide resolved
src/daft-functions/src/binary/length.rs Show resolved Hide resolved
src/daft-functions/src/binary/slice.rs Show resolved Hide resolved
tests/table/binary/test_fixed_size_binary_concat.py Outdated Show resolved Hide resolved
tests/table/binary/test_concat.py Outdated Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants