Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(fix #5472) Set desiredBundleSizeBytes to Long.MaxValue BinaryIO reads to prevent file subrange-splitting #5473

Merged
merged 2 commits into from
Sep 10, 2024

Conversation

clairemcginty
Copy link
Contributor

for context see #5472.

@clairemcginty clairemcginty requested a review from kellen September 5, 2024 20:30
@@ -326,6 +327,9 @@ object BinaryIO {
false
}
}

override def allowsDynamicSplitting(): Boolean =
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is redundant (it's inherited from FileBasedSource#isSplittable if not set) but I noticed that Beam IOs set this just to be explicit, so why not

@@ -344,8 +348,14 @@ object BinaryIO {
fileMetadata: Metadata,
start: Long,
end: Long
): FileBasedSource[Array[Byte]] =
): FileBasedSource[Array[Byte]] = {
Preconditions.checkArgument(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in scala land, we've always used require. IMHO we should keep checkArgument for java land

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed, thanks!

Copy link

codecov bot commented Sep 5, 2024

Codecov Report

Attention: Patch coverage is 75.00000% with 1 line in your changes missing coverage. Please review.

Project coverage is 61.29%. Comparing base (e0a0259) to head (47c068d).
Report is 12 commits behind head on main.

Files with missing lines Patch % Lines
.../src/main/scala/com/spotify/scio/io/BinaryIO.scala 75.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #5473      +/-   ##
==========================================
- Coverage   61.30%   61.29%   -0.01%     
==========================================
  Files         312      312              
  Lines       11068    11072       +4     
  Branches      792      758      -34     
==========================================
+ Hits         6785     6787       +2     
- Misses       4283     4285       +2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@clairemcginty clairemcginty merged commit d485c98 into main Sep 10, 2024
12 checks passed
@clairemcginty clairemcginty deleted the binary-io-fix branch September 10, 2024 19:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants