RuntimeError: expected scalar type Float but found BFloat16 #2

ScottishFold007 · 2023-04-10T10:14:43Z

您好！非常感谢您的代码，我将其借鉴使用在bloom模型中，但出现了报错，以下是涉及的代码和报错情况：
其中，我使用了bf16(True )

麻烦您帮忙看一下，这是什么原因引起的，以及可能的解决方案，谢谢！

kyleliang919 · 2023-04-10T14:50:44Z

这个需要改一下flash attn wrapper 里边的type

Long-context-transformers/flash_attn_wrappers.py

Line 201 in ddda2ce

    
           qkv = torch.concat([reshaped_query_layer.unsqueeze(2), offset_key_layer.unsqueeze(2), reshaped_value_layer.unsqueeze(2)], dim = 2).half()

如果你在用bloom，这里的casting 不应该是half，而应该是 bf16，你可以尝试改一改，并且在launch cmd 里边加入 --bf16, 这个我测试是可行的
不好意思，可能过几天我会加入一个 bf16 的选项，最近有点忙不过来，如果你测试过bf16 能行，很欢迎来做一个pull request

ScottishFold007 · 2023-04-11T03:48:36Z

这个需要改一下flash attn wrapper 里边的type

Long-context-transformers/flash_attn_wrappers.py

Line 201 in ddda2ce

qkv = torch.concat([reshaped_query_layer.unsqueeze(2), offset_key_layer.unsqueeze(2), reshaped_value_layer.unsqueeze(2)], dim = 2).half()

如果你在用bloom，这里的casting 不应该是half，而应该是 bf16，你可以尝试改一改，并且在launch cmd 里边加入 --bf16, 这个我测试是可行的
不好意思，可能过几天我会加入一个 bf16 的选项，最近有点忙不过来，如果你测试过bf16 能行，很欢迎来做一个pull request

谢谢指点哈，我按照您的建议做了修改，但还是出现这个问题：

我的命令行如下：

torchrun --nproc_per_node=8 --master_port=9999 train_revised_for_bloom_flash_attention.py \
    --model_name_or_path /data/models/bigscience_bloom-1b/ \
    --data_path /data/datasets/instruction_data/share_chatgpt_chat/merged_file.json \
    --bf16 True \
    --model_cache_dir /data/model_cache/ \
    --loading_cache_dir /data/data_load_cache/ \
    --test_size 0.05 \
    --do_train True \
    --do_eval True \
    --overwrite_output_dir True \
    --output_dir /data/data_cleaned_1b_v1/ \
    --num_train_epochs 2 \
    --per_device_train_batch_size 4 \
    --per_device_eval_batch_size 4 \
    --gradient_accumulation_steps 8 \
    --evaluation_strategy "no" \
    --save_strategy "steps" \
    --save_steps 100 \
    --eval_steps 100  \
    --save_total_limit 1 \
    --learning_rate 2e-6 \
    --weight_decay 0.001 \
    --warmup_ratio 0.03 \
    --lr_scheduler_type "cosine" \
    --logging_steps 100 \
    --fsdp "full_shard auto_wrap" \
    --tf32 True \
    --max_length 2048 \
    --use_flash_attention True \
    --gradient_checkpointing False  \
    --lazy_preprocess True

是单机八卡在跑

ScottishFold007 · 2023-04-11T03:56:38Z

不知道是不是由于多卡，使用FSDP造成的这个情况

ScottishFold007 · 2023-04-11T04:06:52Z

另外，我用deepspeed是能跑起来的，但是1b的小模型，2048的长度，在单机A100 40G * 8上这么大的显存占用率正常吗？另外这个速度非常的慢。是否是正常的情况？

kyleliang919 · 2023-04-11T15:53:15Z

上面那个fp16的问题解决了吗？如果没有的话，你可以试一下在wrapper 找找float16 的 downcast 然后全部换成 bfloat16

这个repo 是 optimized for memory 并非speed 所以用到了cpu offloading 相对来说会慢一点，然后memory 很大应该是因为：
--per_device_train_batch_size 4
如果你觉得太慢了，可以尝试修改一下deepspeed 里边的config 去掉一部分offloading，因为看起来memory 还没占满。
还有一个问题就是吗，因为bloom 的flash attn alibi bias 需要计算pesudo inverse

Long-context-transformers/flash_attn_wrappers.py

Line 200 in ddda2ce

    
           offset_key_layer = self.attention.inv_norm_factor * reshaped_key_layer + self.attention.beta * (torch.linalg.pinv(reshaped_query_layer.permute(0,2,1,3).float()) * alibi.view(batch_size, alibi.shape[0]//batch_size, alibi.shape[1], alibi.shape[2])).permute(0, 3, 1, 2).half()

很有可能并没有省下来memory，应该能有不这么做approximation的办法，但是需要修改flash attn 的cuda kernel

ScottishFold007 · 2023-04-14T04:34:34Z

抱歉，这几天在搞其他的活。基于deepspeed来跑，就没“RuntimeError: expected scalar type Float but found BFloat16”这个报错，但是用transformers的FSDP就不行了。我觉得这个可以加一个可控的参数，就是可以选择fp16 float32或者bp16，全局来换dtype，以避免这个问题。
另外，如果您解决了“很有可能并没有省下来memory，应该能有不这么做approximation的办法，但是需要修改flash attn 的cuda kernel”这个问题，那么会非常厉害。这个长度消耗的显存能降下来。

ScottishFold007 · 2023-04-14T04:36:10Z

还有啊，有个问题想跟您探讨下，就是alibi的论文标题是“Train Short, Test Long-Attention with Linear Biases Enables Input Length Extrapolation”，理论上是训练512长度，在推理时用2048 3096都是可以的，不需要可以去训练，不知道您这块是怎么理解的？

ScottishFold007 · 2023-04-14T04:49:23Z

上面那个fp16的问题解决了吗？如果没有的话，你可以试一下在wrapper 找找float16 的 downcast 然后全部换成 bfloat16

这个repo 是 optimized for memory 并非speed 所以用到了cpu offloading 相对来说会慢一点，然后memory 很大应该是因为： --per_device_train_batch_size 4 如果你觉得太慢了，可以尝试修改一下deepspeed 里边的config 去掉一部分offloading，因为看起来memory 还没占满。还有一个问题就是吗，因为bloom 的flash attn alibi bias 需要计算pesudo inverse

Long-context-transformers/flash_attn_wrappers.py

Line 200 in ddda2ce

offset_key_layer = self.attention.inv_norm_factor * reshaped_key_layer + self.attention.beta * (torch.linalg.pinv(reshaped_query_layer.permute(0,2,1,3).float()) * alibi.view(batch_size, alibi.shape[0]//batch_size, alibi.shape[1], alibi.shape[2])).permute(0, 3, 1, 2).half()

很有可能并没有省下来memory，应该能有不这么做approximation的办法，但是需要修改flash attn 的cuda kernel

这种修改是否可行？

kyleliang919 · 2023-04-14T17:44:43Z

但是backward 很有可能还是要用一样多的memory，而且这样感觉approximation 会更加不准，不过finetuning 估计或许可以correct 这些误差

swltown · 2023-04-16T08:01:07Z

上面那个fp16的问题解决了吗？如果没有的话，你可以试一下在wrapper 找找float16 的 downcast 然后全部换成 bfloat16
这个repo 是 optimized for memory 并非speed 所以用到了cpu offloading 相对来说会慢一点，然后memory 很大应该是因为： --per_device_train_batch_size 4 如果你觉得太慢了，可以尝试修改一下deepspeed 里边的config 去掉一部分offloading，因为看起来memory 还没占满。还有一个问题就是吗，因为bloom 的flash attn alibi bias 需要计算pesudo inverse

Long-context-transformers/flash_attn_wrappers.py

Line 200 in ddda2ce

offset_key_layer = self.attention.inv_norm_factor * reshaped_key_layer + self.attention.beta * (torch.linalg.pinv(reshaped_query_layer.permute(0,2,1,3).float()) * alibi.view(batch_size, alibi.shape[0]//batch_size, alibi.shape[1], alibi.shape[2])).permute(0, 3, 1, 2).half()

很有可能并没有省下来memory，应该能有不这么做approximation的办法，但是需要修改flash attn 的cuda kernel

这种修改是否可行？
你好，我在尝试使用lora适配到GPT2上时也遇到个类似的问题
RuntimeError: expected scalar type Half but found Float。我看你在huggingface的peft库中close了这个issue不知道你是不是已经解决了，如果解决了能告诉下吗？拜托了

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RuntimeError: expected scalar type Float but found BFloat16 #2

RuntimeError: expected scalar type Float but found BFloat16 #2

ScottishFold007 commented Apr 10, 2023

kyleliang919 commented Apr 10, 2023 •

edited

Loading

ScottishFold007 commented Apr 11, 2023 •

edited

Loading

ScottishFold007 commented Apr 11, 2023

ScottishFold007 commented Apr 11, 2023

kyleliang919 commented Apr 11, 2023 •

edited

Loading

ScottishFold007 commented Apr 14, 2023

ScottishFold007 commented Apr 14, 2023

ScottishFold007 commented Apr 14, 2023

kyleliang919 commented Apr 14, 2023 •

edited

Loading

swltown commented Apr 16, 2023

RuntimeError: expected scalar type Float but found BFloat16 #2

RuntimeError: expected scalar type Float but found BFloat16 #2

Comments

ScottishFold007 commented Apr 10, 2023

kyleliang919 commented Apr 10, 2023 • edited Loading

ScottishFold007 commented Apr 11, 2023 • edited Loading

ScottishFold007 commented Apr 11, 2023

ScottishFold007 commented Apr 11, 2023

kyleliang919 commented Apr 11, 2023 • edited Loading

ScottishFold007 commented Apr 14, 2023

ScottishFold007 commented Apr 14, 2023

ScottishFold007 commented Apr 14, 2023

kyleliang919 commented Apr 14, 2023 • edited Loading

swltown commented Apr 16, 2023

kyleliang919 commented Apr 10, 2023 •

edited

Loading

ScottishFold007 commented Apr 11, 2023 •

edited

Loading

kyleliang919 commented Apr 11, 2023 •

edited

Loading

kyleliang919 commented Apr 14, 2023 •

edited

Loading