Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: MLA实现没有带来任何收益 #274

Open
1 task done
foamliu opened this issue Jan 1, 2025 · 0 comments
Open
1 task done

[Bug]: MLA实现没有带来任何收益 #274

foamliu opened this issue Jan 1, 2025 · 0 comments
Labels
bug Something isn't working triage

Comments

@foamliu
Copy link

foamliu commented Jan 1, 2025

Is there an existing issue ? / 是否已有相关的 issue ?

  • I have searched, and there is no existing issue. / 我已经搜索过了,没有相关的 issue。

Describe the bug / 描述这个 bug

MLA(multi head latent attention)的实现本来是为着提升推理速度,但由于存入缓存的数据比基线(Llama)更大,因此不但未带来任何收益,而且与基线(Llama)相比,占用显存更多,推理更慢。

To Reproduce / 如何复现

推理测速

Expected behavior / 期望的结果

提升推理速度

Screenshots / 截图

下面是 DeepSeekV3 HF官网的MLA实现,可见存入KVCache的数据量,比基线(Llama)还大
cba6bdda9920aacfab1acc96e21652a

Environment / 环境

- OS: [e.g. Ubuntu 20.04] 22.04
- Pytorch: [e.g. torch 2.0.0] 2.4.0
- CUDA: [e.g. CUDA 11.8] 12.1
- Device: [e.g. A10, RTX3090] A800

Additional context / 其他信息

image

@foamliu foamliu added bug Something isn't working triage labels Jan 1, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working triage
Projects
None yet
Development

No branches or pull requests

1 participant