[Bug] GatedLinearAttention got NaN #122

980202006 · 2025-01-17T06:09:01Z

Describe the Bug

Forward output nan, GatedLinearAttention

Steps to Reproduce the Bug

        o, recurrent_state = fused_chunk_gla(
            q=q,
            k=k,
            v=v,
            g=gk,
            initial_state=recurrent_state,
            output_final_state=use_cache,
            head_first=False
        )
        if torch.isnan(o).any():
                1==1

head = 1
seq_len=43884
dim=64

Expected Behavior

There is no nan in the forward output

Environment Information

Torch:'2.5.1+cu124'
Triton:3.0 nightly

The text was updated successfully, but these errors were encountered:

yzhangcs · 2025-01-19T10:40:10Z

@980202006 Hello, could you provide the input tensors resulting in NaNs?

yzhangcs · 2025-01-19T10:40:47Z

BTW, it's recomended to utilize chunk mode instead of fused_chunk

yzhangcs · 2025-01-19T12:00:34Z

@980202006 Could you check 458c018 again?

980202006 added the bug Something isn't working label Jan 17, 2025

980202006 changed the title ~~[Bug]~~ [Bug] GatedLinearAttention got NaN Jan 17, 2025

yzhangcs self-assigned this Jan 17, 2025

yzhangcs added a commit that referenced this issue Jan 19, 2025

[GLA] Fix potential exp overflows in fused_chunk (#122)

458c018

sustcsonglin closed this as completed Jan 19, 2025

sustcsonglin reopened this Jan 19, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] GatedLinearAttention got NaN #122

[Bug] GatedLinearAttention got NaN #122

980202006 commented Jan 17, 2025

yzhangcs commented Jan 19, 2025

yzhangcs commented Jan 19, 2025

yzhangcs commented Jan 19, 2025

[Bug] GatedLinearAttention got NaN #122

[Bug] GatedLinearAttention got NaN #122

Comments

980202006 commented Jan 17, 2025

Describe the Bug

Steps to Reproduce the Bug

Expected Behavior

Environment Information

yzhangcs commented Jan 19, 2025

yzhangcs commented Jan 19, 2025

yzhangcs commented Jan 19, 2025