-
Notifications
You must be signed in to change notification settings - Fork 131
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFC] Check services before deleting MLModel #3191
Comments
How about each tool provides a default value from code? |
Hi @zane-neo and @dhrubo-os ! Is it possible that some field, when the user is searching for the agent, can be hidden? We don't wish to return redundant fields like |
Plus one more con for option 3: need to maintain 2 maps, one in ml-commons and another in skills. It's not quite straightforward to merge two maps. |
Plus one thing: model controller. Hi @xinyual! Can you include it in the RFC? |
model controller is to set limitation for multiply users and not a downstream task. We don't need to consider it here. |
If so, it comes back to the option 1. Why option 2 is here is we want customer to input it and be clear about what relatedModelIds is. Besides, option 2 will keep the consistency between request body and GET/SEARCH agent API. |
IMO, the difference lies in whether the customer can override the default value of relatedModelIds. |
To the best of my knowledge, model controller relies on a single model id. If the model gets deleted, the model controller becomes useless. Let's also discuss whether the model controller should get deleted after deleting the model. |
Yes this can be excluded in search and get request. |
It make sense, we should consider model controller as it's targeting specific models. |
Option1 LGTM, but let's refine the names like method name and field name. Also another thing we need consider is the new field is not visible to users and it's reserved, we should update the doc and add validation logic in agent creation to ensure users are not using it as their business key. |
From my perspective, model controller is just a config for one user when using a model. Deleting the model won't influence the user's experience related with the deleted model so much. |
True it doesn't have functionality impact not cleaning model controller, but it's a good practice to remove the unnecessary resources in system if the underlying dependency is been cleaned up. |
Agree that model controller should be cleaned up. There are a few options for user experience.
The problem also persist in agent and search processor. I prefer a unified way. |
Background
Currently, models in ML-commons are widely used, particularly by pipelines and the agent framework. However, we lack a mechanism to detect if a model is being utilized by other services. To address this, we propose a new feature that will identify whether a model is in use by another service. If so, an error should be raised—similar to when attempting to delete a connector that is still in use (like here).
All related service
We list all service using MLModel.
Solution
We can apply the same solution used for connectors and models. Generally, when attempting to delete a specific model ID, we can search both services (pipelines and agents) to check if they are using the current model ID. We have separate solution for each service since they are different
Pipeline:
For all the pipelines we developed, we found that they all use the key model_id to document their relevant model id. Therefore, we can use GetPipelineAction or GetSearchPipelineAction to retrieve all pipelines and then check the key “model_id” to detect whether they reference the current model ID.
Agent and Tools
Tools are more complex than pipelines because they use different keys to store the model IDs they reference. Furthermore, some tools use more than one model ID. For example, RAGTool requires both embedding_model_id and inference_model_id, while PPLTool uses model_id to document its model. The target field we need to check varies by tool, making it challenging to detect relevant models with a single template. Therefore, we propose a few options to address this complexity.
Option 1 (prefer)
We add an abstract function GETAllModelKeys in Base Class Factory.
Each Tool’s factory needs to implement function and return a list of string which contains all fields can use model. For example, PPLTool should return [“model_id”] while RAGTool will return [“embedding_model_id”, “inference_model_id”].
When registering an agent, we call these functions to retrieve the relevant field names, parse the input parameters, and then write them to a fixed key, relatedModelIds. For example, A flow agent containing PPLTool's registration body is:
After parsing in registering, we will write the following body into index
Then we can use the following DSL template to query:
Pros: We don’t change any API. When develop new tools, we don’t have additional maintenance cost.
Cons: When customer call GET AGENT API, the “relatedModelIds” field will be shown and may confuse customer.
Option 2
For basic class Tool, we will implement an attribute “relatedModelIds”. All tools can be registered with a parameter:, it’s an optional one. For example,
then we can use the following DSL template to query agent index:
Pros: Relatively low code change.
Cons: If customer doesn’t provide this in request body, we cannot know which model is used by tool and the solution won’t become effective. It also requires redundant information in the request body, which may confuse users.
option 3
We maintain two separate mapping objects in ML-commons and skills. The key is the tool type, and the value is the field name that stores related model IDs. When searching for relevant agents, we iterate through all mappings and construct a DSL for each tool.
For example, we have
Then we would create a DSL for PPLTool like
Pros: It will not impact the user interaction experience.
Cons: High maintenance cost, as each new tool requires adding a corresponding value to the map.
The text was updated successfully, but these errors were encountered: