-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support in-place Concat
when input is a captured value
#399
Comments
While working on this I ran into an issue where in-place extension of the KV-cache fails for some caches in the Whisper example because a KV-cache tensor (eg. |
After creating the initial execution plan by traversing the graph backwards from the requested outputs, re-order it by traversing the plan forwards and choosing a preferred operator to run from the frontier of runnable operators at each step. The initial logic for picking the next operator is to choose the first non-in-place operator if there is one (eg. Shape) or the first operator otherwise. This re-ordering increases the likelihood that an operator which can run in-place is able to do so, because its in-place input is less likely to be required by a non in-place op that runs later. Fixes #98 See also #399 (comment).
After creating the initial execution plan by traversing the graph backwards from the requested outputs, re-order it by traversing the plan forwards and choosing a preferred operator to run from the frontier of runnable operators at each step. The initial logic for picking the next operator is to choose the first non-in-place operator if there is one (eg. Shape) or the first operator otherwise. This re-ordering increases the likelihood that an operator which can run in-place is able to do so, because its in-place input is less likely to be required by a non in-place op that runs later. Fixes #98 See also #399 (comment).
Another complication with Hugging Face models exported by Optimum: The KV cache buffers initially reserved by rten-generate don't get used if using the |
When preparing the capture environment to pass to an operator with subgraphs, if a captured value is not going to be needed by subsequent operators in the current graph then it can be passed by value rather than by reference. In the subgraph, this allows the value to be (potentially) used as an in-place input. - Pass `CaptureEnv` to `Graph::run_subgraph` by value rather than by reference - Add map of by-value captures to `CaptureEnv` - Add `CaptureEnv::child` method which creates a child environment with no captures of its own. This is not currently used but will be needed by loop operators which run a subgraph more than once and need to obtain a fresh capture environment for each iteration. Part of #399.
When preparing the capture environment to pass to an operator with subgraphs, if a captured value is not going to be needed by subsequent operators in the current graph then it can be passed by value rather than by reference. In the subgraph, this allows the value to be (potentially) used as an in-place input. - Pass `CaptureEnv` to `Graph::run_subgraph` by value rather than by reference - Add map of by-value captures to `CaptureEnv` - Add `CaptureEnv::child` method which creates a child environment with no captures of its own. This is not currently used but will be needed by loop operators which run a subgraph more than once and need to obtain a fresh capture environment for each iteration. Part of #399.
#239 added an optimization that reduces the cost of KV-cache extension by a
Concat
op fromO(sequence_length)
toO(1)
. This optimization currently does not work if the input KV-cache buffer is a captured value in a subgraph. This is because captured values are exposed to subgraphs as immutable views, even if the value was passed toModel::run
as an owned value. For the optimization to work,Concat
needs to be run as an in-place operation which requires the first input to be an owned value.Solving the general problem that captured values cannot be used in in-place operations would help in other ways too. For example there are models that try to reshape captured values and currently this reshape copies the data unnecessarily.
Conceptually what I think is needed is the ability for subgraph capture environments (
CaptureEnv
) to capture values by value rather than by reference, if those values are not going to be used outside of the subgraph.The text was updated successfully, but these errors were encountered: