Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

insertRows() returns no errors when connection is lost #802

Open
aronsemle opened this issue Jul 30, 2024 · 5 comments
Open

insertRows() returns no errors when connection is lost #802

aronsemle opened this issue Jul 30, 2024 · 5 comments

Comments

@aronsemle
Copy link

aronsemle commented Jul 30, 2024

The customer is sending OT data to Snowflake. They want to ensure that all data makes it to Snowflake. They are using a local store & forward feature in our app, where we store the data to disk and on successfully sending, we delete the data. If the send fails, we retry.

Issue
The SDK insertRows call returns no errors after forcibly killing the internet connection. It takes minutes for it to eventually start reporting failed inserts.

Expected
There is a faster way to detect a connection failure so applications can guarantee data delivery

Reproduce

  1. Create a connection and start sending data using the following API call
    var response = channel.insertRows(rowInserts, null);
  2. Kill your internet the connection
  3. The insertRows call continues to return no errors and throw no exceptions for minutes
  4. Eventually isClosed() returns true and the calls fail

Maybe there is a workaround or I'm improperly using the SDK?

@sfc-gh-tzhang
Copy link
Contributor

hi, this is the nature of async processing and we want to wait for a few minutes in case there is any network glitch. The rows will be queued in memory until the connection is back. What you need to do is to use getLatestCommittedOffsetToken before deleting any source data. Also see check out this generic topic about offset token.

@aronsemle
Copy link
Author

Thanks @sfc-gh-tzhang I was able to implement this. That said, I think exposing a way to say "send now" OR a way to remove/control the 1 second thread that sends out the data would be helpful. In cases where you need guaranteed delivery, the current approach introduces a max of 1 second delay. That seems small, but when you're sending a lot of data it adds up.

@sfc-gh-tzhang
Copy link
Contributor

Take a look at https://docs.snowflake.com/en/user-guide/data-load-snowpipe-streaming-overview#latency, basically you can control the flush internal via MAX_CLIENT_LAG parameter

@aronsemle
Copy link
Author

I found this, but it looks like it's no faster than 1 second? Is that right?

@sfc-gh-tzhang
Copy link
Contributor

That's correct, the minimum is 1s for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants