Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Model fails with AttributeError: 'ReplitLMTokenizer' object has no attribute 'sp_model' #46

Open
ShisuiMadara opened this issue Jan 13, 2024 · 1 comment

Comments

@ShisuiMadara
Copy link

Here is the full error prompt
2024-01-13 22:58:35,407: rank0[1152][MainThread]: INFO: __main__: Building tokenizer... Traceback (most recent call last): File "/llm-foundry/scripts/train/train.py", line 653, in <module> main(cfg) File "/llm-foundry/scripts/train/train.py", line 454, in main tokenizer = build_tokenizer(tokenizer_name, tokenizer_kwargs) File "/llm-foundry/llmfoundry/utils/builders.py", line 404, in build_tokenizer tokenizer = AutoTokenizer.from_pretrained(tokenizer_name, File "/usr/lib/python3/dist-packages/transformers/models/auto/tokenization_auto.py", line 774, in from_pretrained return tokenizer_class.from_pretrained(pretrained_model_name_or_path, *inputs, **kwargs) File "/usr/lib/python3/dist-packages/transformers/tokenization_utils_base.py", line 2028, in from_pretrained return cls._from_pretrained( File "/usr/lib/python3/dist-packages/transformers/tokenization_utils_base.py", line 2260, in _from_pretrained tokenizer = cls(*init_inputs, **init_kwargs) File "/root/.cache/huggingface/modules/transformers_modules/replit/replit-code-v1-3b/cc0a4f17a8d72b71d62ea53cb0e23e4dac352067/replit_lm_tokenizer.py", line 66, in __init__ super().__init__(bos_token=bos_token, eos_token=eos_token, unk_token=unk_token, pad_token=pad_token, sep_token=sep_token, sp_model_kwargs=self.sp_model_kwargs, **kwargs) File "/usr/lib/python3/dist-packages/transformers/tokenization_utils.py", line 367, in __init__ self._add_tokens( File "/usr/lib/python3/dist-packages/transformers/tokenization_utils.py", line 467, in _add_tokens current_vocab = self.get_vocab().copy() File "/root/.cache/huggingface/modules/transformers_modules/replit/replit-code-v1-3b/cc0a4f17a8d72b71d62ea53cb0e23e4dac352067/replit_lm_tokenizer.py", line 76, in get_vocab vocab = {self.convert_ids_to_tokens(i): i for i in range(self.vocab_size)} File "/root/.cache/huggingface/modules/transformers_modules/replit/replit-code-v1-3b/cc0a4f17a8d72b71d62ea53cb0e23e4dac352067/replit_lm_tokenizer.py", line 73, in vocab_size return self.sp_model.get_piece_size() AttributeError: 'ReplitLMTokenizer' object has no attribute 'sp_model'

I am using the following versions:
Python - 3.10.13
Transformers - 4.36.0

I followed each step as mentioned on llm-foundary and am running the instance in the docker container. This case is the same for running the direct version from huggingface.

@ShisuiMadara
Copy link
Author

ShisuiMadara commented Jan 20, 2024

@madhavatreplit sorry for the ping. But only you seem to be responding on this repository. I was hoping to know if there is any chance this would be fixed in the update. The model on hugging face is also not working. I think this is due to the transformer version. I had followed the steps from LLMFoundary to fine tune this.
Kindly see to it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant