Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SageAttention on ComfyUI #11

Open
blepping opened this issue Oct 14, 2024 · 2 comments
Open

SageAttention on ComfyUI #11

blepping opened this issue Oct 14, 2024 · 2 comments

Comments

@blepping
Copy link

i made a very simple ComfyUI node to replace the attention implementation with SageAttention: https://gist.github.com/blepping/fbb92a23bc9697976cc0555a0af3d9af

seems like a decent performance improvement on SDXL. SageAttention seems to fail when k/v aren't the same shape as q (on attn2 which i believe is cross-attention).

For SD15, none of the head sizes are currently supported so it doesn't do anything. not sure if you are interested in supporting SD15 (or SDXL cross-attentions). if any more information would be helpful, please let me know.

you can close this issue, just thought i would post this in case anyone wanted to try it with ComfyUI.

note: it's not a normal model patch, so to enable/or disable, make sure the node runs. simply bypassing or removing it won't work correctly.

@wardensc2
Copy link

Hi @blepping

I already install like you said and got the node working but so far the speed is still the same, I test both SDXL and Flux with image size 1024x1024, i'm not sure whether the node work or not because the speed is the same. I get this notice when image finished generated:
image

Can you give me some examples json files to check whether this node work or not

Thank you

@blepping
Copy link
Author

blepping commented Oct 15, 2024

@wardensc2 thanks for giving it a try. i don't think there's really a way to do it wrong in the workflow.
image

attention improvements seem to make the most difference on large images. i didn't test with Flux (not sure if it uses the same kind of attentions or has compatible sizes). for my tests with SDXL, i got 8.94s/it with PyTorch attention and 6.71s/it using 4096x4096 resolution on a 4060Ti (about a 25% speed increase). the difference might not be big enough to see at small resolutions like 1024x1024. (think i might have been testing with smooth_k disabled - it didn't seem necessary with SDXL and should be a bit faster.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants