You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It would be great when running inspect eval or inspect eval-set to be able to set a global cap on the total_tokens consumed (input tokens + output tokens). This is for budgeting reasons: sometimes an experiment will turn out to be unexpectedly expensive due to a large number of samples or tokens used per sample, and we'd like to be able to stop the expensive experiments automatically.
The ideal behaviour would be to stop all currently running task instances when the global cap is reached, and not to schedule any more, so that the total tokens used stays under budget. Any completed task instances/samples should be recorded as normal so a partial log is still generated (similar to when a run is aborted due to errors).
Being able to set a monetary budget in dollars would be an alternative. I'd consider the features roughly equivalent since cost roughly tracks tokens for a given model (a monetary budget may be more directly useful but harder to implement).
The text was updated successfully, but these errors were encountered:
It would be great when running
inspect eval
orinspect eval-set
to be able to set a global cap on thetotal_tokens
consumed (input tokens + output tokens). This is for budgeting reasons: sometimes an experiment will turn out to be unexpectedly expensive due to a large number of samples or tokens used per sample, and we'd like to be able to stop the expensive experiments automatically.The ideal behaviour would be to stop all currently running task instances when the global cap is reached, and not to schedule any more, so that the total tokens used stays under budget. Any completed task instances/samples should be recorded as normal so a partial log is still generated (similar to when a run is aborted due to errors).
Being able to set a monetary budget in dollars would be an alternative. I'd consider the features roughly equivalent since cost roughly tracks tokens for a given model (a monetary budget may be more directly useful but harder to implement).
The text was updated successfully, but these errors were encountered: