Skip to content

Commit

Permalink
Dont predict EOText token
Browse files Browse the repository at this point in the history
  • Loading branch information
TJ-Solergibert committed Aug 2, 2024
1 parent 2d882db commit d5228bb
Showing 1 changed file with 3 additions and 1 deletion.
4 changes: 3 additions & 1 deletion src/nanotron/data/chat_tokenizer.py
Original file line number Diff line number Diff line change
Expand Up @@ -54,7 +54,9 @@ def __call__(self, conversation: List[dict]) -> Tuple[List[int], List[bool]]:

# Append <|end_of_text|> token
tokens.extend(self.tokenizer.encode("<|end_of_text|>", add_special_tokens=False))
is_completitions.append(True)
is_completitions.append(
False
) # NOTE(tj.solergibert) No need to predict <|end_of_text|> token from <|eot_id|> token

return tokens, is_completitions

Expand Down

0 comments on commit d5228bb

Please sign in to comment.