Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update/Delete batch inference tasks that have been idle for a certain time #3161

Open
rbhavna opened this issue Oct 24, 2024 · 1 comment
Open
Assignees
Labels
enhancement New feature or request

Comments

@rbhavna
Copy link
Collaborator

rbhavna commented Oct 24, 2024

In batch inference, we have batch jobs task management by which user can get/cancel the batch jobs running on remote service. Currently, whenever user tries to get a batch job details, we send the request to remote service and update the task status accordingly in ml-commons. Based on the status of such tasks, if there are tasks that are still in running/created status, they will be considered active tasks. We now have a limit on batch inference requests that a user can send based on the running/created tasks in the index. This can be controlled by a setting whose default value is 10 for now.

But the problem is if user does not try explicitly to get the batch job details, the status remains same in ml index even if the status changed on the remote service. Also, these tasks list can grow very large in the long run and we need a mechanism to delete/update the idle tasks that are not being used for a long time. So that we can clean up the index and the limits for batch_inference requests will also be reflected accordingly.

We can plan on using a sync up job that periodically checks for tasks that are running for a long time and get its actual status from the remote service and update accordingly.

@rbhavna rbhavna self-assigned this Oct 24, 2024
@rbhavna rbhavna added enhancement New feature or request and removed untriaged labels Oct 24, 2024
@pyek-bot
Copy link
Contributor

Rather than storing the task status within an index, can we simply dynamically fetch it from the remote service? In this case when a user issues a new bulk request, we can fetch the current number of running tasks and rate limit on that? However, this is only feasible if the remote services track total tasks and if we are not using task status within our index for any other purpose.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Status: Untriaged
Development

No branches or pull requests

2 participants