Why Running Llama infer in A10 get Wrong answer? #21

MeJerry215 · 2024-10-24T02:31:28Z

exmaples codes

import torch
from transformers import LlamaTokenizer, LlamaForCausalLM
from sageattention import sageattn
import torch.nn.functional as F

F.scaled_dot_product_attention = sageattn

# 加载预训练的 LLaMA 模型和 tokenizer
model_name = "llama-7b-hf"
tokenizer = LlamaTokenizer.from_pretrained(model_name)
model = LlamaForCausalLM.from_pretrained(model_name, torch_dtype=torch.float16)

# 将模型移动到 GPU（如果可用）
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)

# 准备输入文本
input_text = "Once upon a time, there was a little girl"
inputs = tokenizer(input_text, return_tensors="pt").to(device)

# 执行推理
with torch.no_grad():
    output = model.generate(**inputs, max_length=50, num_return_sequences=1)

# 解码输出
generated_text = tokenizer.decode(output[0], skip_special_tokens=True)
print(generated_text)

device envirments:


torch                    2.5.0
triton                   3.1.0
transformers             4.45.2

The text was updated successfully, but these errors were encountered:

MeJerry215 · 2024-10-24T02:33:05Z

expects answer

Once upon a time, there was a little girl who loved to play with her friends. One day, she decided to play with her friends in the forest. She was very happy. She played with her friends in the forest. She played with

but got

Once upon a time, there was a little girl whole and the 1882P a.
ficenda2P avalN64YourEm.
ficOnDe Ce the GISP.
 gev

jt-zhang · 2024-10-24T08:34:32Z

We have not test the accuracy by using F.scaled_dot_product_attention = sageattn in Llama.
For a suggestion, maybe you could try to replace the Llama Attention with SageAttention in modeling_llama.py.

MeJerry215 · 2024-10-24T09:22:11Z

How to replace the Llama Attention with SageAttention ? @jt-zhang

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why Running Llama infer in A10 get Wrong answer? #21

Why Running Llama infer in A10 get Wrong answer? #21

MeJerry215 commented Oct 24, 2024

MeJerry215 commented Oct 24, 2024

jt-zhang commented Oct 24, 2024

MeJerry215 commented Oct 24, 2024

Why Running Llama infer in A10 get Wrong answer? #21

Why Running Llama infer in A10 get Wrong answer? #21

Comments

MeJerry215 commented Oct 24, 2024

MeJerry215 commented Oct 24, 2024

jt-zhang commented Oct 24, 2024

MeJerry215 commented Oct 24, 2024