You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As shown in the Figure 1 of the paper, I want to know how to independently measure the performance(rps or latency) of the Prefill phase and the Decode phase?
Now, I have tried these ways:
Prefill: input_len = x, output_len = 1
Decode: input_len = 1, output_len = y
Is this method correct? Maybe this method doesn't take into account kvcache, but I don't know how to simulate this part.
And is there any other more precise way?
I would greatly appreciate any guidance on profiling these two phases separately.
The text was updated successfully, but these errors were encountered:
As shown in the Figure 1 of the paper, I want to know how to independently measure the performance(rps or latency) of the Prefill phase and the Decode phase?
Now, I have tried these ways:
Is this method correct? Maybe this method doesn't take into account kvcache, but I don't know how to simulate this part.
And is there any other more precise way?
I would greatly appreciate any guidance on profiling these two phases separately.
The text was updated successfully, but these errors were encountered: