ScaleAttention: a custom CUDA kernel, optimized for inference. #356

guocuimi · 2025-01-01T04:08:54Z

TODOS:

guocuimi added roadmap performance Improvements to performance labels Jan 1, 2025

guocuimi mentioned this issue Jan 1, 2025

kernel: added attention kernel for sm80 (Happy new year!) #355

Merged

guocuimi changed the title ~~ScaleAttention: a custom CUDA kernel, optimized for Multi-Query Attention~~ ScaleAttention: a custom CUDA kernel, optimized for inference. Jan 1, 2025

guocuimi pinned this issue Jan 1, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ScaleAttention: a custom CUDA kernel, optimized for inference. #356

ScaleAttention: a custom CUDA kernel, optimized for inference. #356

guocuimi commented Jan 1, 2025

ScaleAttention: a custom CUDA kernel, optimized for inference. #356

ScaleAttention: a custom CUDA kernel, optimized for inference. #356

Comments

guocuimi commented Jan 1, 2025