-
Notifications
You must be signed in to change notification settings - Fork 514
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Batched lookups for streaming GRPC endpoints and BigTable #5521
base: main
Are you sure you want to change the base?
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #5521 +/- ##
==========================================
+ Coverage 61.42% 61.48% +0.05%
==========================================
Files 312 312
Lines 11104 11121 +17
Branches 757 796 +39
==========================================
+ Hits 6821 6838 +17
Misses 4283 4283 ☔ View full report in Codecov by Sentry. |
When getting an response for a batch request, return an UnmatchedRequestException for unmatched requests
I think we can keep the same API for that case. When receiving a batch response missing some entries form the batch request, we should fail fast and report those. It is then up to the user to filter out those exceptions if this is an expected behavior. See #5532 |
The rule of thump I tried to apply here is fail, if the client you're using is throwing an exception. BigTable not having an entry for a key is kind of something expected. But we can merge your PR. And the error handling can be added to the |
46805a9
to
5dcab72
Compare
@RustedBones I've gone ahead and rebased this brach based on your PR #5532 and dropped the |
Add batched version of grpcLookupStream and BigTableDoFn
a8368cf
to
d39cba0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks for the contribution. It makes a lot of sense to handle missing response in batch in a lenient way instead of failing the job.
import com.google.cloud.bigtable.config.BigtableOptions; | ||
import com.google.cloud.bigtable.grpc.BigtableSession; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a heads-up, we will probably move to the BigtableDataClient
in some next minor release, see here. Expect some breaking changes in that part.
This aims to extend the usability for AsyncBatchLookupDoFn it adds:
grpcLookupBatchStream
for batched GRPC endpoints with streaming response.BigTableBatchDoFn
for batch calls to BigTable including a cache.To make this happen the AsyncBatchLookupDoFn had to be changed to gracefully handle partial batched responses.