Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

snowflake: streaming PUTs to internal stages #50

Open
jgraettinger opened this issue Oct 13, 2021 · 1 comment
Open

snowflake: streaming PUTs to internal stages #50

jgraettinger opened this issue Oct 13, 2021 · 1 comment
Labels
enhancement New feature or request

Comments

@jgraettinger
Copy link
Member

Not long ago, I attempted to update the connector to use the gosnowflake driver's recently added support for PUTs to Snowflake stages. I ran into a bug that has since been fixed, and we should try again.

While we're at it, we should switch to streaming PUTs to the internal stage as we consume from the Store iterator (instead of staging to a local temporary file, and then starting to upload only after the Store iterator is consumed). In my own profiling, it seems like this would materially reduce data stalls as we execute transactions, as these PUTs typically take seconds to complete for larger files.

As further context, our philosophy on connector errors and retries has shifted, and we're planning to implement a watch-dog in the control plane which looks for failed shards and restarts them with a backoff policy. That means the connector doesn't need to worry about spurious errors and retries while executing the PUT to Snowflake -- it can implement the more efficient, direct strategy of simply streaming to the stage and hoping for the best.

@jgraettinger jgraettinger added the enhancement New feature or request label Oct 13, 2021
@williamhbaker
Copy link
Member

I attempted to implement streaming PUTs and while it does work now, the memory usage is not practical since the gosnowflake driver reads the entire stream into memory, see snowflakedb/gosnowflake#536. Once this is resolved it should be straightforward to switch to streaming PUTs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants