Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to independently measure the performance of the Prefill phase and the Decode phase? #50

Open
J1nLo opened this issue Nov 19, 2024 · 0 comments

Comments

@J1nLo
Copy link

J1nLo commented Nov 19, 2024

As shown in the Figure 1 of the paper, I want to know how to independently measure the performance(rps or latency) of the Prefill phase and the Decode phase?

Now, I have tried these ways:

  • Prefill: input_len = x, output_len = 1
  • Decode: input_len = 1, output_len = y

Is this method correct? Maybe this method doesn't take into account kvcache, but I don't know how to simulate this part.

And is there any other more precise way?

I would greatly appreciate any guidance on profiling these two phases separately.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant