Enable flash attention #20448

divyashreepathihalli · 2024-11-04T23:40:49Z

This PR

refactors the MHA layer so that its _compute_attention method would just call ops.dot_production_attention
Adds a global toggle keras.config.enable_flash_attention and keras.config.is_flash_attention_enabled
Modify attention mask to match _masked_softmax implementation

codecov-commenter · 2024-11-04T23:47:21Z

Codecov Report

Attention: Patch coverage is 73.68421% with 10 lines in your changes missing coverage. Please review.

Project coverage is 82.02%. Comparing base (30a6b87) to head (9a5200e).

Files with missing lines	Patch %	Lines
keras/src/layers/attention/multi_head_attention.py	68.75%	2 Missing and 3 partials ⚠️
keras/api/_tf_keras/keras/config/__init__.py	0.00%	3 Missing ⚠️
keras/src/backend/jax/nn.py	50.00%	1 Missing and 1 partial ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master   #20448      +/-   ##
==========================================
- Coverage   82.03%   82.02%   -0.02%     
==========================================
  Files         515      515              
  Lines       47346    47383      +37     
  Branches     7427     7435       +8     
==========================================
+ Hits        38842    38865      +23     
- Misses       6705     6714       +9     
- Partials     1799     1804       +5

Flag	Coverage Δ
keras	`81.87% <73.68%> (-0.02%)`	⬇️
keras-jax	`64.94% <68.42%> (-0.01%)`	⬇️
keras-numpy	`59.89% <50.00%> (-0.02%)`	⬇️
keras-tensorflow	`65.96% <60.52%> (-0.02%)`	⬇️
keras-torch	`64.86% <71.05%> (-0.01%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

fchollet

Thanks for the PR!

keras/src/backend/config.py

keras/src/layers/attention/multi_head_attention.py

keras/src/layers/attention/multi_head_attention_test.py

keras/src/layers/attention/multi_head_attention.py

keras/src/layers/attention/attention.py

keras/src/layers/attention/multi_head_attention.py

keras/src/layers/attention/multi_head_attention_test.py

SamanehSaadat · 2024-11-11T18:10:59Z

keras/src/layers/attention/multi_head_attention.py

+        query,
+        key,
+        value,
+        return_attention_scores,


@divyashreepathihalli shouldn't return_attention_scores have a default value here?
There might be many instances of _compute_attention call that don't pass return_attention_scores value and I think this change can break them. There is an example here. So I was wondering if it's possible to set a default value here so that other references of _compute_attention work as before.
I think this change is probably the reason that the test is failing in keras-team/keras-hub#1977

This is tricky, right now the value passed to the call method is passed to _compute_attention. If we add a default value here and users don't pass the arg value from call it might override the call arg. and that could cause discrepancies.

Isn't there any other way to check and make sure the users pass the arg value (rather than lack of default value here)?

I think a work around would be to add a self._return_attention_scores and then set it in the call method and use it in _compute_attention. wdyt?

Enable flash attention

37e2302

google-ml-butler bot added the size:M label Nov 4, 2024

google-ml-butler bot assigned gbaned Nov 4, 2024

divyashreepathihalli requested a review from fchollet November 4, 2024 23:41

google-ml-butler bot added the awaiting review label Nov 4, 2024

code reformat

057aa66

fchollet reviewed Nov 5, 2024

View reviewed changes

divyashreepathihalli added 3 commits November 5, 2024 01:14

address review comments

1abc948

add docstring

5784361

update docstring

3a47c53

divyashreepathihalli marked this pull request as draft November 5, 2024 01:31

divyashreepathihalli added 2 commits November 5, 2024 18:15

add numerical correctness test

760e4b2

code reformat

045f153

fchollet reviewed Nov 5, 2024

View reviewed changes

divyashreepathihalli added 8 commits November 5, 2024 19:13

use causal mask from call method

71bf0ce

address review comments

3d2875f

update if

6f99a57

fix tests

4065bf3

update tests

a7390a6

enable flash attention on TPU JAX

71cbbd5

update code

3af4c95

minor fix

b9e075a

fchollet marked this pull request as ready for review November 5, 2024 22:44

fchollet reviewed Nov 5, 2024

View reviewed changes

divyashreepathihalli added 5 commits November 5, 2024 23:49

address review comments

5b4d23d

fix tests

59b0672

run api_gen

f269d06

code reformat

02f3451

fix mask issue

f622918

divyashreepathihalli added the kokoro:force-run label Nov 7, 2024

kokoro-team removed the kokoro:force-run label Nov 7, 2024

merge master

18d6ebd

divyashreepathihalli added the kokoro:force-run label Nov 7, 2024

kokoro-team removed the kokoro:force-run label Nov 7, 2024

code reformat

a36d119

divyashreepathihalli added the kokoro:force-run label Nov 7, 2024

kokoro-team removed the kokoro:force-run label Nov 7, 2024

divyashreepathihalli added the kokoro:force-run label Nov 7, 2024

kokoro-team removed the kokoro:force-run label Nov 7, 2024

divyashreepathihalli force-pushed the enable_flash_attention branch from 3e34498 to a36d119 Compare November 7, 2024 03:34

fchollet reviewed Nov 7, 2024

View reviewed changes

keras/src/layers/attention/multi_head_attention_test.py Show resolved Hide resolved

james77777778 mentioned this pull request Nov 7, 2024

Improve future compatibility of CLIPMultiHeadAttention keras-team/keras-hub#1975

Merged

set bias to None

ae34d7f

divyashreepathihalli added the kokoro:force-run label Nov 7, 2024

kokoro-team removed the kokoro:force-run label Nov 7, 2024

divyashreepathihalli added the kokoro:force-run label Nov 7, 2024

kokoro-team removed the kokoro:force-run label Nov 7, 2024

divyashreepathihalli added 2 commits November 7, 2024 11:54

Merge branch 'keras-team:master' into enable_flash_attention

98edd93

disable GPU test

9a5200e

divyashreepathihalli added the kokoro:force-run label Nov 7, 2024

kokoro-team removed the kokoro:force-run label Nov 7, 2024

fchollet approved these changes Nov 7, 2024

View reviewed changes

google-ml-butler bot added kokoro:force-run ready to pull Ready to be merged into the codebase labels Nov 7, 2024

fchollet merged commit 5bf4ac7 into keras-team:master Nov 7, 2024
8 of 9 checks passed

google-ml-butler bot removed awaiting review ready to pull Ready to be merged into the codebase kokoro:force-run labels Nov 7, 2024

SamanehSaadat reviewed Nov 11, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable flash attention #20448

Enable flash attention #20448

divyashreepathihalli commented Nov 4, 2024 •

edited

Loading

codecov-commenter commented Nov 4, 2024 •

edited

Loading

fchollet left a comment

SamanehSaadat Nov 11, 2024

divyashreepathihalli Nov 11, 2024

SamanehSaadat Nov 11, 2024

divyashreepathihalli Nov 11, 2024

Enable flash attention #20448

Enable flash attention #20448

Conversation

divyashreepathihalli commented Nov 4, 2024 • edited Loading

codecov-commenter commented Nov 4, 2024 • edited Loading

Codecov Report

fchollet left a comment

Choose a reason for hiding this comment

SamanehSaadat Nov 11, 2024

Choose a reason for hiding this comment

divyashreepathihalli Nov 11, 2024

Choose a reason for hiding this comment

SamanehSaadat Nov 11, 2024

Choose a reason for hiding this comment

divyashreepathihalli Nov 11, 2024

Choose a reason for hiding this comment

divyashreepathihalli commented Nov 4, 2024 •

edited

Loading

codecov-commenter commented Nov 4, 2024 •

edited

Loading