Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Batched lookups for streaming GRPC endpoints and BigTable #5521

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

lofifnc
Copy link
Contributor

@lofifnc lofifnc commented Nov 4, 2024

This aims to extend the usability for AsyncBatchLookupDoFn it adds:

  • grpcLookupBatchStream for batched GRPC endpoints with streaming response.
  • BigTableBatchDoFn for batch calls to BigTable including a cache.
    To make this happen the AsyncBatchLookupDoFn had to be changed to gracefully handle partial batched responses.

@lofifnc lofifnc changed the title WIP: AsyncBatchLookup consistency WIP: BaseAsyncBatchLookupDoFn consistency Nov 4, 2024
Copy link

codecov bot commented Nov 27, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 61.48%. Comparing base (cc1e584) to head (d39cba0).

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #5521      +/-   ##
==========================================
+ Coverage   61.42%   61.48%   +0.05%     
==========================================
  Files         312      312              
  Lines       11104    11121      +17     
  Branches      757      796      +39     
==========================================
+ Hits         6821     6838      +17     
  Misses       4283     4283              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@lofifnc lofifnc marked this pull request as ready for review November 27, 2024 14:08
@lofifnc lofifnc changed the title WIP: BaseAsyncBatchLookupDoFn consistency Add Batched lookups for streaming GRPC endpoints and BigTable Nov 28, 2024
When getting an response for a batch request, return an
UnmatchedRequestException for unmatched requests
@RustedBones
Copy link
Contributor

grpcSafeBatchLookup for calling endpoints which don't include missing values into the batch response.

I think we can keep the same API for that case. When receiving a batch response missing some entries form the batch request, we should fail fast and report those. It is then up to the user to filter out those exceptions if this is an expected behavior. See #5532

@lofifnc
Copy link
Contributor Author

lofifnc commented Nov 29, 2024

grpcSafeBatchLookup for calling endpoints which don't include missing values into the batch response.

I think we can keep the same API for that case. When receiving a batch response missing some entries form the batch request, we should fail fast and report those. It is then up to the user to filter out those exceptions if this is an expected behavior. See #5532

The rule of thump I tried to apply here is fail, if the client you're using is throwing an exception. BigTable not having an entry for a key is kind of something expected.
If we just return a failure I know, that I would make the mistake not adding handling for that and then on the first job run discovering that I'm getting a bunch of exceptions.

But we can merge your PR. And the error handling can be added to the . grpcSafeBatchLookup.

@lofifnc lofifnc force-pushed the batched-bigtable-lookup branch from 46805a9 to 5dcab72 Compare November 29, 2024 07:15
@lofifnc
Copy link
Contributor Author

lofifnc commented Nov 29, 2024

@RustedBones I've gone ahead and rebased this brach based on your PR #5532 and dropped the .safeGrpcLookup for now. This resolves all contentious parts, at least for me.

Add batched version of  grpcLookupStream and BigTableDoFn
@lofifnc lofifnc force-pushed the batched-bigtable-lookup branch from a8368cf to d39cba0 Compare November 29, 2024 08:34
Copy link
Contributor

@RustedBones RustedBones left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for the contribution. It makes a lot of sense to handle missing response in batch in a lenient way instead of failing the job.

Comment on lines +20 to +21
import com.google.cloud.bigtable.config.BigtableOptions;
import com.google.cloud.bigtable.grpc.BigtableSession;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a heads-up, we will probably move to the BigtableDataClient in some next minor release, see here. Expect some breaking changes in that part.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants