GGUF support for BERT architecture #34238

Dimmension · 2024-10-18T08:40:18Z

Feature request

I want to add the ability to use GGUF BERT models in transformers.
Currently the library does not support this architecture. When I try to load it, I get an error TypeError: Architecture 'bert' is not supported.
I have done most of the mapping, with some fields I am having difficulty.
Can anybody help me and provide comments on this feature?

Motivation

I ran into a problem that I can't use gguf models in RASA(rasa uses standard from_pretrained). So I decided to make BERT support

Your contribution

That's my extended ggml.py file

GGUF_TENSOR_MAPPING = {
    "bert": {
        "context_length": "max_position_embeddings",
        "block_count": "num_hidden_layers",
        "feed_forward_length": "intermediate_size",
        "embedding_length": "hidden_size",
        "attention.head_cgguf>=0.10.0ount": "num_attention_heads",
        "attention.layer_norm_rms_epsilon": "rms_norm_eps",
        # "attention.causal": "",
        # "pooling_type": "",
        "vocab_size": "vocab_size",
    }
}
 
GGUF_CONFIG_MAPPING = {
    "bert": {
        "context_length": "max_position_embeddings",
        "block_count": "num_hidden_layers",
        "feed_forward_length": "intermediate_size",
        "embedding_length": "hidden_size",
        "attention.head_cgguf>=0.10.0ount": "num_attention_heads",
        "attention.layer_norm_rms_epsilon": "rms_norm_eps",
        # "attention.causal": "",
        # "pooling_type": "",
        "vocab_size": "vocab_size",
    }
}
 
GGUF_TOKENIZER_MAPPING = {
    "tokenizer": {
        # "ggml.token_type_count": "",
        # "ggml.pre": "",
        "ggml.model": "tokenizer_type",
        "ggml.tokens": "all_special_tokens",
        "ggml.token_type": "all_special_ids",
        "ggml.unknown_token_id": "unk_token_id",
        "ggml.seperator_token_id": "sep_token_id",
        "ggml.padding_token_id": "pad_token_id",
        "ggml.cls_token_id": "cls_token_id",
        "ggml.mask_token_id": "mask_token_id",
    },
    "tokenizer_config": {       
        "ggml.unknown_token_id": "unk_token_id",
        "ggml.seperator_token_id": "sep_token_id",
        "ggml.padding_token_id": "pad_token_id",
        "ggml.cls_token_id": "cls_token_id",
        "ggml.mask_token_id": "mask_token_id",
    },
}

VladOS95-cyber · 2024-10-18T13:55:46Z

Hi @Dimmension, there is dedicated open issue #33260. You could take a look how other architectures and tests were added and follow the same logic. Usefull links you may need as well are: https://github.com/ggerganov/llama.cpp/blob/master/convert_hf_to_gguf.py where you can find tensors and config conversion logic, this one is for mapping: https://github.com/ggerganov/llama.cpp/blob/master/gguf-py/gguf/tensor_mapping.py, in order to correctly rename all tensors back.

Dimmension added the Feature request Request for a new feature label Oct 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GGUF support for BERT architecture #34238

GGUF support for BERT architecture #34238

Dimmension commented Oct 18, 2024 •

edited

Loading

VladOS95-cyber commented Oct 18, 2024 •

edited

Loading

GGUF support for BERT architecture #34238

GGUF support for BERT architecture #34238

Comments

Dimmension commented Oct 18, 2024 • edited Loading

Feature request

Motivation

Your contribution

VladOS95-cyber commented Oct 18, 2024 • edited Loading

Dimmension commented Oct 18, 2024 •

edited

Loading

VladOS95-cyber commented Oct 18, 2024 •

edited

Loading