-
Notifications
You must be signed in to change notification settings - Fork 129
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DiT with decorator, triton fused_AdaLN and fineGrained #552
base: develop
Are you sure you want to change the base?
Conversation
Thanks for your contribution! |
def compute_activation(self, ffn1_out): | ||
origin_batch_size = ffn1_out.shape[0] | ||
origin_seq_len = ffn1_out.shape[1] | ||
ffn1_out = ffn1_out.reshape([origin_batch_size*origin_seq_len, ffn1_out.shape[-1]]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这两个reshape加的不太好,建议拓展下fused_bias_act的实现
…nto DiT_FFN_fineGrained 'merge develop for push'
…addleMIX into DiT_FFN_fineGrained 'merge myRepo develop for push'
|
||
# To speed up this code, call zkk and let him run for you, | ||
# then you will get a speed increase of almost 100%. | ||
os.environ['callZKK']= "True" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个环境变量改成其他的,可以optimize_inference_for_ditllama?
|
DiT with decorator, triton fused_AdaLN/fused_rotary_emb, horizontal fusion qkv and fineGrained ffn.
25步 + 256*256 + 新ir + 5次端到端取均值
3B最终耗时:581ms (+61.2%)
7B最终耗时:926ms (+41.4%)