Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SNOW-536706: Binding Parameters to Array Variables #540

Closed
mihaitodor opened this issue Jan 28, 2022 · 3 comments
Closed

SNOW-536706: Binding Parameters to Array Variables #540

mihaitodor opened this issue Jan 28, 2022 · 3 comments

Comments

@mihaitodor
Copy link

mihaitodor commented Jan 28, 2022

Issue description

I'm unable to figure out what is the right query to do a bulk insert of n values using array binding, where 1 < n < CLIENT_STAGE_ARRAY_BINDING_THRESHOLD. Using the debugger, I determined that CLIENT_STAGE_ARRAY_BINDING_THRESHOLD is set to 65280.

I get an error if I try to run ALTER SESSION SET CLIENT_STAGE_ARRAY_BINDING_THRESHOLD = 1, but I don't think I really need to change this parameter and I'm guessing it's just some internal field that you don't intend to expose (it doesn't show up when I run SHOW PARAMETERS, but I saw it in your tests), so let's ignore this part now.

The problem is that I can't seem to figure out which is the right way of inserting multiple records into a table with a variant column. I tried multiple options, but can't figure out one that works in all cases and I'd like to avoid hardcoding 65280 as a threshold, since it's not a documented limit. I guess I can create a plain SQL statement without binding parameters when I need to insert between 1 and 65279 rows, but that's a really nasty hack...

Please advise.

Example code

I am using the following query to create a table:

CREATE OR REPLACE TABLE MIHAI(RECORD variant);

And the following Go code:

recordArray := strings.Split(strings.TrimSuffix(strings.Repeat(`{"abc":"def"},`, rows), ","), ",")
_, err = db.Query(`INSERT INTO MIHAI VALUES(parse_json(?))`, gosnowflake.Array(recordArray))

If I set rows to 1, it fails with this log:
out1.log
(SQL compilation error: Invalid expression [PARSE_JSON(?)] in VALUES clause)

If I set rows to any number between 2 and 65279, it fails with these logs:
out2.log
out65279.log
(SQL compilation error: error line 1 at position 36 Bind variable ? not set.)

If I set rows to 65280, it succeeds with this log:
out65280.log

Then I tried to change the Go code to this:

recordArray := strings.Split(strings.TrimSuffix(strings.Repeat(`{"abc":"def"},`, rows), ","), ",")
_, err = db.Query(`INSERT INTO MIHAI SELECT parse_json(column1) FROM VALUES (?)`, gosnowflake.Array(recordArray))

If I set rows to 1, it succeeds with this log:
out1.log

But, if I set rows between 2 and 65279, it fails with these logs:
out2.log
out65279.log
(SQL compilation error: Array bind currently not supported for this query type.)

And, finally, if I set rows to 65280, it fails with this log:
out65280.log
(SQL compilation error: error line 1 at position 58 Bind variable ? not set.)

Error log

See attached inline above

Configuration

Driver version (or git SHA):
v1.6.6

Go version: run go version in your console
1.17.5

Server version:
6.1.1

Client OS:
macOS Catalina 10.15.7

@github-actions github-actions bot changed the title Binding Parameters to Array Variables SNOW-536706: Binding Parameters to Array Variables Jan 28, 2022
@sfc-gh-jbahk
Copy link
Collaborator

Hi @mihaitodor, thank you for your inquiry. The long and short of it is that bulk array binds are meant to be used internally to distinguish whether to simply bind the array through a conventional REST request or go through a staging mechanism to achieve the same thing; it's not something we leave configurable to the user.

In terms of variant binding, we currently don't support it, which is in line with the failure post 1 element of the array.

Please let me know if this answered your questions! The first part is something that we've decided not to expose and the second is something in our backlog but currently not available.

@mihaitodor
Copy link
Author

mihaitodor commented Feb 11, 2022

@sfc-gh-jbahk Thank you for the update! I guess that makes sense and I’m looking forward to having support for variant binding. Would be great to get an update once that’s done. Should I open a separate issue for that? Also, is there some explanation for the discrepancies I described above between the two queries? To me, they should behave identically.

I’m somewhat surprised that you don’t want to allow users to tweak the threshold for creating a temporary stage and it makes me uncomfortable about using this insert-based mechanism for bulk uploads, because, obviously, you could change the threshold without notice and that would have a big impact on performance. This leads me to think that you prefer to have an explicit Snowpipe as the default mechanism for bulk inserts, but, unfortunately, this library doesn’t offer much support for it, since I had to do quite a bit of digging to figure out how to construct a valid JWT and then I had to construct the REST endpoint manually. It’s not a huge burden and I’ll upstream my work as open source to https://benthos.dev (open issue here), but since there’s no guidance in the docs on which mechanism to use, I wasn’t sure which path to follow. I can add support for both, but I’d like to write some clear guidance for future users.

@sfc-gh-jbahk
Copy link
Collaborator

No need to open a separate issue for variant binding - it's already something in our backlog (#217). The driver's implementation of it depends on a lot of things so it will take some time before it's fully rolled out. I think the discrepancy comes from how the server decides to handle this.

The threshold is something we've decided to not let users control for a multitude of reasons, including the fact that the stage creation/insertion/copy mechanism puts a different type of strain where there's a tradeoff considering the overhead of setting it up in the backend. The number may change in the future, but it will be internal for the time being.

I appreciate your patience with these matters.

mihaitodor added a commit to mihaitodor/connect that referenced this issue Oct 11, 2022
When inserting large amounts of data, the `snowflake_put` output
should outperform `sql_insert` because it uses Snowpipe directly.
However, the Snowflake SQL driver also attempts to send the data
to a temporary stage when the number of messages in a batch
exceeds an [undocumented](https://pkg.go.dev/github.com/snowflakedb/gosnowflake#hdr-Batch_Inserts_and_Binding_Parameters)
threshold, but it's not very consistent. Details [here](snowflakedb/gosnowflake#540).

This implementation should satisfy the basic use cases, but it
might be possible to optimise inserts later by binding array
variables to parameters. Details [here](https://pkg.go.dev/github.com/snowflakedb/gosnowflake#hdr-Binding_Parameters_to_Array_Variables).

Unfortunately, I was unable to add integration tests based on
MySQL as described [here](snowflakedb/gosnowflake#279)
because [this code](https://github.com/snowflakedb/gosnowflake/blob/74e351e5e110c5b4c409730b47d5fa4058ab1c6f/auth.go#L45-L91)
does not allow me to set the `authenticator` parameter to
`tokenaccessor`, so I can't get the driver to bypass the
authentication step. There might be other blockers too, not sure.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants