You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Here is the full error prompt 2024-01-13 22:58:35,407: rank0[1152][MainThread]: INFO: __main__: Building tokenizer... Traceback (most recent call last): File "/llm-foundry/scripts/train/train.py", line 653, in <module> main(cfg) File "/llm-foundry/scripts/train/train.py", line 454, in main tokenizer = build_tokenizer(tokenizer_name, tokenizer_kwargs) File "/llm-foundry/llmfoundry/utils/builders.py", line 404, in build_tokenizer tokenizer = AutoTokenizer.from_pretrained(tokenizer_name, File "/usr/lib/python3/dist-packages/transformers/models/auto/tokenization_auto.py", line 774, in from_pretrained return tokenizer_class.from_pretrained(pretrained_model_name_or_path, *inputs, **kwargs) File "/usr/lib/python3/dist-packages/transformers/tokenization_utils_base.py", line 2028, in from_pretrained return cls._from_pretrained( File "/usr/lib/python3/dist-packages/transformers/tokenization_utils_base.py", line 2260, in _from_pretrained tokenizer = cls(*init_inputs, **init_kwargs) File "/root/.cache/huggingface/modules/transformers_modules/replit/replit-code-v1-3b/cc0a4f17a8d72b71d62ea53cb0e23e4dac352067/replit_lm_tokenizer.py", line 66, in __init__ super().__init__(bos_token=bos_token, eos_token=eos_token, unk_token=unk_token, pad_token=pad_token, sep_token=sep_token, sp_model_kwargs=self.sp_model_kwargs, **kwargs) File "/usr/lib/python3/dist-packages/transformers/tokenization_utils.py", line 367, in __init__ self._add_tokens( File "/usr/lib/python3/dist-packages/transformers/tokenization_utils.py", line 467, in _add_tokens current_vocab = self.get_vocab().copy() File "/root/.cache/huggingface/modules/transformers_modules/replit/replit-code-v1-3b/cc0a4f17a8d72b71d62ea53cb0e23e4dac352067/replit_lm_tokenizer.py", line 76, in get_vocab vocab = {self.convert_ids_to_tokens(i): i for i in range(self.vocab_size)} File "/root/.cache/huggingface/modules/transformers_modules/replit/replit-code-v1-3b/cc0a4f17a8d72b71d62ea53cb0e23e4dac352067/replit_lm_tokenizer.py", line 73, in vocab_size return self.sp_model.get_piece_size() AttributeError: 'ReplitLMTokenizer' object has no attribute 'sp_model'
I am using the following versions:
Python - 3.10.13
Transformers - 4.36.0
I followed each step as mentioned on llm-foundary and am running the instance in the docker container. This case is the same for running the direct version from huggingface.
The text was updated successfully, but these errors were encountered:
@madhavatreplit sorry for the ping. But only you seem to be responding on this repository. I was hoping to know if there is any chance this would be fixed in the update. The model on hugging face is also not working. I think this is due to the transformer version. I had followed the steps from LLMFoundary to fine tune this.
Kindly see to it.
Here is the full error prompt
2024-01-13 22:58:35,407: rank0[1152][MainThread]: INFO: __main__: Building tokenizer... Traceback (most recent call last): File "/llm-foundry/scripts/train/train.py", line 653, in <module> main(cfg) File "/llm-foundry/scripts/train/train.py", line 454, in main tokenizer = build_tokenizer(tokenizer_name, tokenizer_kwargs) File "/llm-foundry/llmfoundry/utils/builders.py", line 404, in build_tokenizer tokenizer = AutoTokenizer.from_pretrained(tokenizer_name, File "/usr/lib/python3/dist-packages/transformers/models/auto/tokenization_auto.py", line 774, in from_pretrained return tokenizer_class.from_pretrained(pretrained_model_name_or_path, *inputs, **kwargs) File "/usr/lib/python3/dist-packages/transformers/tokenization_utils_base.py", line 2028, in from_pretrained return cls._from_pretrained( File "/usr/lib/python3/dist-packages/transformers/tokenization_utils_base.py", line 2260, in _from_pretrained tokenizer = cls(*init_inputs, **init_kwargs) File "/root/.cache/huggingface/modules/transformers_modules/replit/replit-code-v1-3b/cc0a4f17a8d72b71d62ea53cb0e23e4dac352067/replit_lm_tokenizer.py", line 66, in __init__ super().__init__(bos_token=bos_token, eos_token=eos_token, unk_token=unk_token, pad_token=pad_token, sep_token=sep_token, sp_model_kwargs=self.sp_model_kwargs, **kwargs) File "/usr/lib/python3/dist-packages/transformers/tokenization_utils.py", line 367, in __init__ self._add_tokens( File "/usr/lib/python3/dist-packages/transformers/tokenization_utils.py", line 467, in _add_tokens current_vocab = self.get_vocab().copy() File "/root/.cache/huggingface/modules/transformers_modules/replit/replit-code-v1-3b/cc0a4f17a8d72b71d62ea53cb0e23e4dac352067/replit_lm_tokenizer.py", line 76, in get_vocab vocab = {self.convert_ids_to_tokens(i): i for i in range(self.vocab_size)} File "/root/.cache/huggingface/modules/transformers_modules/replit/replit-code-v1-3b/cc0a4f17a8d72b71d62ea53cb0e23e4dac352067/replit_lm_tokenizer.py", line 73, in vocab_size return self.sp_model.get_piece_size() AttributeError: 'ReplitLMTokenizer' object has no attribute 'sp_model'
I am using the following versions:
Python - 3.10.13
Transformers - 4.36.0
I followed each step as mentioned on llm-foundary and am running the instance in the docker container. This case is the same for running the direct version from huggingface.
The text was updated successfully, but these errors were encountered: