Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Specify external stage upon log_model #110

Open
brentonmallen1 opened this issue Jul 22, 2024 · 9 comments
Open

Specify external stage upon log_model #110

brentonmallen1 opened this issue Jul 22, 2024 · 9 comments

Comments

@brentonmallen1
Copy link

I've been trying to use the log_model mechanism to upload a custom model object but am encountering permission issues due to seemingly default behavior for this function to create and use an internal stage. I'd like to use a specific external stage that has been created, but haven't been able to find a means to do this in a straight forward way. Is this possible? Any perspective would be appreciated. Thank you!

@sfc-gh-sdas
Copy link
Collaborator

Thanks for your comment.

The internal stage we use in log_model() library is temporary one. It gets deleted as soon as session is terminated. Ultimately all the weights files are adopted inside the model object; this temp stage is used for staging only.

May I ask why do you need to use external stage for this staging for such ephemeral use-case?

@brentonmallen1
Copy link
Author

Thanks for the response!

I think that 1) I'm not sure I'm understanding how a model upload is meant to work and 2) the specific call to the internal temporary stage is blocked for security reasons which is outside of my control. I poured through the code and it seems the stage is meant to be a UUID that can't be altered or otherwise specified by design. The model is a custom model (a pre-trained random forest) and as far as I can tell, there's no manual way to upload a custom model from within the snowflake GUI/console. It seems to want to walk the user through the training process for a select few models.

My current understanding is that upon model registration and upload, the model object can then be used as a UDF of sorts within an ordinary query. Is there a way to get the model to it's final destination, so that it can be used in this way, without the need for this temporary stage?

any guidance would be appreciated. thank you!

@sfc-gh-sdas
Copy link
Collaborator

sfc-gh-sdas commented Jul 24, 2024

I'm not sure I'm understanding how a model upload is meant to work

Model has its own storage which is meant to be immutable on construction. Hence, we first upload the serialized model to a temp stage and copy from there to immutable stage at construction of model.

the specific call to the internal temporary stage is blocked for security reasons which is outside of my control.

Do you know the security concern? We have one more security measure coming for temp objects. If I know the concern, I can say if new feature would fix the issue.

Is there a way to get the model to it's final destination, so that it can be used in this way, without the need for this temporary stage?

Currently no. Final destination is something immutable. So users cannot directly write to it.

You mentioned if we could use an external stage. All you can write to is an external stage? Do you have access to a permanent internal stage that is writeable? Currently our SDK does not support user-supplied writeable stage but potentially we could allow that.

@brentonmallen1
Copy link
Author

thanks for the clarification of the immutable storage pipeline. I don't know what the exact security concerns are but my speculation is that it might be related to not having control over the pipeline to store the model object to ensure compliance and validation. This makes me wonder if the immutable storage itself might be blocked as well if it's internal, if it were directly accessible/writable.

My understanding is that internal storage is owned/hosted by snowflake and I think that might be the main issue. It seems like a customer owned / external immutable storage mechanism specifically to house the model objects would be a viable option, but I'm not sure. I'll try to find more information about the concerns if I can.

@brentonmallen1
Copy link
Author

after looking into things further, it seems like the issue i'm encountering may not be a permissions issue but rather some issue with our proxy that might be inhibiting the file uploads. Upon model log retries, i get the error for different parts of the upload; sometimes for the manifest file, other times for the zip file. I've been told by colleagues that uploads through GUIs tend to be a bit more robust for us.

Refining the idea here a bit, maybe some mechanism or path that allows for the custom model artifacts to be created and/or uploaded from within the GUI/console in the ML studio section. For example, upload the pickled model, some input and output data for the signature inference, and a list of dependencies (or other arguments to the log_model call).

I realize this is an esoteric issue/experience, so no worries about marking this closed if that's best. I can comment/reopen when/if I figure things out. I'd just like to avoid the stored proc -> external function -> api integration path that seems to be my only alternative as this is much more difficult to dev, test, maintain overall.

Any other thoughts or suggestions are valued and appreciated. Thanks again for the time!

@sfc-gh-sdas
Copy link
Collaborator

Are you in private link by any chance? We have heard this before from some other customers in privatelink. If so, please put query id & other details related to private link and file a support ticket.

Regarding suggestion around UI to upload a model: We have lots of logic to serialize a model including API for custom model. It is hard to support all use-cases in UI because we have to leave some technical expectation to users to perform before uploading. But this is a good suggestion, I'll pass to our product managers.

@brentonmallen1
Copy link
Author

no worries, I appreciate the complexities. what do you mean by a private link? can you provide a bit more context around that, just so I know how to phrase my ticket to be the most productive?

@sfc-gh-sdas
Copy link
Collaborator

By privatelink I mean, https://docs.snowflake.com/en/user-guide/admin-security-privatelink or https://docs.snowflake.com/en/user-guide/privatelink-azure

If not, by any chance, you can have firewall or something that can block traffic intermittently?

@brentonmallen1
Copy link
Author

I'm not sure, I'll have to ask around.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants