Support in-place `Concat` when input is a captured value #399

robertknight · 2024-11-02T08:27:40Z

#239 added an optimization that reduces the cost of KV-cache extension by a Concat op from O(sequence_length) to O(1). This optimization currently does not work if the input KV-cache buffer is a captured value in a subgraph. This is because captured values are exposed to subgraphs as immutable views, even if the value was passed to Model::run as an owned value. For the optimization to work, Concat needs to be run as an in-place operation which requires the first input to be an owned value.

Solving the general problem that captured values cannot be used in in-place operations would help in other ways too. For example there are models that try to reshape captured values and currently this reshape copies the data unnecessarily.

Conceptually what I think is needed is the ability for subgraph capture environments (CaptureEnv) to capture values by value rather than by reference, if those values are not going to be used outside of the subgraph.

The text was updated successfully, but these errors were encountered:

robertknight · 2024-11-10T07:45:41Z

While working on this I ran into an issue where in-place extension of the KV-cache fails for some caches in the Whisper example because a KV-cache tensor (eg. past_key_values.0.decoder.key) is an input to both a Concat op and a Shape op, and the Concat op gets scheduled first. The buffer can't be used in-place because the Shape op needs it later. See also #98.

After creating the initial execution plan by traversing the graph backwards from the requested outputs, re-order it by traversing the plan forwards and choosing a preferred operator to run from the frontier of runnable operators at each step. The initial logic for picking the next operator is to choose the first non-in-place operator if there is one (eg. Shape) or the first operator otherwise. This re-ordering increases the likelihood that an operator which can run in-place is able to do so, because its in-place input is less likely to be required by a non in-place op that runs later. Fixes #98 See also #399 (comment).

robertknight · 2024-11-13T11:46:46Z

Another complication with Hugging Face models exported by Optimum: The KV cache buffers initially reserved by rten-generate don't get used if using the decoder_model_merged model. The first run of the models uses an alternate branch which doesn't use the past_key_values.{layer}.decoder.{key, value} inputs. Instead new tensors are allocated which don't have spare capacity reserved for in-place growth.

When preparing the capture environment to pass to an operator with subgraphs, if a captured value is not going to be needed by subsequent operators in the current graph then it can be passed by value rather than by reference. In the subgraph, this allows the value to be (potentially) used as an in-place input. - Pass `CaptureEnv` to `Graph::run_subgraph` by value rather than by reference - Add map of by-value captures to `CaptureEnv` - Add `CaptureEnv::child` method which creates a child environment with no captures of its own. This is not currently used but will be needed by loop operators which run a subgraph more than once and need to obtain a fresh capture environment for each iteration. Part of #399.

robertknight · 2024-11-15T08:51:13Z

Resolved in #407. Some additional changes were required in rten-generate (#408) to take advantage of this in the Hugging Face models (see Whisper and TrOCR examples) where the issue originally came up.

robertknight added the performance Issues that affect model inference or loading performance label Nov 2, 2024

robertknight mentioned this issue Nov 12, 2024

Re-order execution plan to enable more operations to run in-place #405

Merged

robertknight mentioned this issue Nov 14, 2024

Support passing captures by-value to subgraphs #407

Merged

robertknight closed this as completed Nov 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support in-place `Concat` when input is a captured value #399

Support in-place `Concat` when input is a captured value #399

robertknight commented Nov 2, 2024 •

edited

Loading

robertknight commented Nov 10, 2024

robertknight commented Nov 13, 2024

robertknight commented Nov 15, 2024

Support in-place Concat when input is a captured value #399

Support in-place Concat when input is a captured value #399

Comments

robertknight commented Nov 2, 2024 • edited Loading

robertknight commented Nov 10, 2024

robertknight commented Nov 13, 2024

robertknight commented Nov 15, 2024

Support in-place `Concat` when input is a captured value #399

Support in-place `Concat` when input is a captured value #399

robertknight commented Nov 2, 2024 •

edited

Loading