Model fails with AttributeError: 'ReplitLMTokenizer' object has no attribute 'sp_model' #46

ShisuiMadara · 2024-01-13T23:45:21Z

Here is the full error prompt
2024-01-13 22:58:35,407: rank0[1152][MainThread]: INFO: __main__: Building tokenizer... Traceback (most recent call last): File "/llm-foundry/scripts/train/train.py", line 653, in <module> main(cfg) File "/llm-foundry/scripts/train/train.py", line 454, in main tokenizer = build_tokenizer(tokenizer_name, tokenizer_kwargs) File "/llm-foundry/llmfoundry/utils/builders.py", line 404, in build_tokenizer tokenizer = AutoTokenizer.from_pretrained(tokenizer_name, File "/usr/lib/python3/dist-packages/transformers/models/auto/tokenization_auto.py", line 774, in from_pretrained return tokenizer_class.from_pretrained(pretrained_model_name_or_path, *inputs, **kwargs) File "/usr/lib/python3/dist-packages/transformers/tokenization_utils_base.py", line 2028, in from_pretrained return cls._from_pretrained( File "/usr/lib/python3/dist-packages/transformers/tokenization_utils_base.py", line 2260, in _from_pretrained tokenizer = cls(*init_inputs, **init_kwargs) File "/root/.cache/huggingface/modules/transformers_modules/replit/replit-code-v1-3b/cc0a4f17a8d72b71d62ea53cb0e23e4dac352067/replit_lm_tokenizer.py", line 66, in __init__ super().__init__(bos_token=bos_token, eos_token=eos_token, unk_token=unk_token, pad_token=pad_token, sep_token=sep_token, sp_model_kwargs=self.sp_model_kwargs, **kwargs) File "/usr/lib/python3/dist-packages/transformers/tokenization_utils.py", line 367, in __init__ self._add_tokens( File "/usr/lib/python3/dist-packages/transformers/tokenization_utils.py", line 467, in _add_tokens current_vocab = self.get_vocab().copy() File "/root/.cache/huggingface/modules/transformers_modules/replit/replit-code-v1-3b/cc0a4f17a8d72b71d62ea53cb0e23e4dac352067/replit_lm_tokenizer.py", line 76, in get_vocab vocab = {self.convert_ids_to_tokens(i): i for i in range(self.vocab_size)} File "/root/.cache/huggingface/modules/transformers_modules/replit/replit-code-v1-3b/cc0a4f17a8d72b71d62ea53cb0e23e4dac352067/replit_lm_tokenizer.py", line 73, in vocab_size return self.sp_model.get_piece_size() AttributeError: 'ReplitLMTokenizer' object has no attribute 'sp_model'

I am using the following versions:
Python - 3.10.13
Transformers - 4.36.0

I followed each step as mentioned on llm-foundary and am running the instance in the docker container. This case is the same for running the direct version from huggingface.

The text was updated successfully, but these errors were encountered:

ShisuiMadara · 2024-01-20T22:23:14Z

@madhavatreplit sorry for the ping. But only you seem to be responding on this repository. I was hoping to know if there is any chance this would be fixed in the update. The model on hugging face is also not working. I think this is due to the transformer version. I had followed the steps from LLMFoundary to fine tune this.
Kindly see to it.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Model fails with AttributeError: 'ReplitLMTokenizer' object has no attribute 'sp_model' #46

Model fails with AttributeError: 'ReplitLMTokenizer' object has no attribute 'sp_model' #46

ShisuiMadara commented Jan 13, 2024

ShisuiMadara commented Jan 20, 2024 •

edited

Loading

Model fails with AttributeError: 'ReplitLMTokenizer' object has no attribute 'sp_model' #46

Model fails with AttributeError: 'ReplitLMTokenizer' object has no attribute 'sp_model' #46

Comments

ShisuiMadara commented Jan 13, 2024

ShisuiMadara commented Jan 20, 2024 • edited Loading

ShisuiMadara commented Jan 20, 2024 •

edited

Loading