Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Runtime-independent registration of MLFLow models #101

Open
Wimsen opened this issue May 15, 2024 · 1 comment
Open

Runtime-independent registration of MLFLow models #101

Wimsen opened this issue May 15, 2024 · 1 comment
Assignees

Comments

@Wimsen
Copy link

Wimsen commented May 15, 2024

Registering MLFlow models is currently done by referencing an in-memory pyfunc model. Snippet from the documentation:

registry.log_model(
    model=mlflow.pyfunc.load_model(model_uri),
    model_name="mlflowModel",
    version_name="v1",
    conda_dependencies=["mlflow<=2.4.0", "scikit-learn", "scipy"],
    options={"ignore_mlflow_dependencies": True}
)

A problem with this is that mlflow.pyfunc.load_model() requires that the model's dependencies are available in the current python runtime calling registry.log_model(). The model's dependencies and the runtime's are probably divergent, and worst-case incompatible with each other.

An example of the latter is if your model is trained using scikit-learn < 1.2.1. Correct deserialization of the model in the registration runtime is then impossible, as snowflake-ml-python itself depends on scikit-learn (>=1.2.1,<1.4). A workaround is installing and loading the model with a newer version of scikit-learn, but this is inadvisible for obvious reasons.

Is it possible to make the registration of MLFLow models independent of the registered model's dependencies? Ideally the model registration just uploads the model artifacts to the model registry, and the actual loading and deserialization of the MLFlow model is done at inference-time using the correct dependencies.

@sfc-gh-sdas
Copy link
Collaborator

Thanks for reporting & apologies for late reply.

This is a valid concern. Ideally we should not request you to pyfunc.load_model() instead we should try to get the information directly from model_uri. Let us look into this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants