replaced call to _prepare_decoder_attention_mask()
with _prepare_4d_causal_attention_mask()
#2553
Job | Run time |
---|---|
4m 18s | |
4m 23s | |
3m 32s | |
3m 36s | |
20m 15s | |
20m 15s | |
4m 37s | |
3m 28s | |
1h 4m 24s |