-
Notifications
You must be signed in to change notification settings - Fork 514
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
(fix #5472) Set desiredBundleSizeBytes to Long.MaxValue BinaryIO reads to prevent file subrange-splitting #5473
Conversation
…eads to prevent file-splitting
@@ -326,6 +327,9 @@ object BinaryIO { | |||
false | |||
} | |||
} | |||
|
|||
override def allowsDynamicSplitting(): Boolean = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is redundant (it's inherited from FileBasedSource#isSplittable
if not set) but I noticed that Beam IOs set this just to be explicit, so why not
@@ -344,8 +348,14 @@ object BinaryIO { | |||
fileMetadata: Metadata, | |||
start: Long, | |||
end: Long | |||
): FileBasedSource[Array[Byte]] = | |||
): FileBasedSource[Array[Byte]] = { | |||
Preconditions.checkArgument( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also copied from a non-splittable Beam IO: https://github.com/apache/beam/blob/v2.58.1/sdks/java/core/src/main/java/org/apache/beam/sdk/io/TFRecordIO.java#L497
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in scala land, we've always used require
. IMHO we should keep checkArgument
for java land
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed, thanks!
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #5473 +/- ##
==========================================
- Coverage 61.30% 61.29% -0.01%
==========================================
Files 312 312
Lines 11068 11072 +4
Branches 792 758 -34
==========================================
+ Hits 6785 6787 +2
- Misses 4283 4285 +2 ☔ View full report in Codecov by Sentry. |
for context see #5472.