small error in the source code #85

wont · 2020-05-09T05:17:03Z

#bert.py#49l

def tokenize(self, text: str):
""" tokenize input"""
words = word_tokenize(text)
tokens = []
valid_positions = []
for i,word in enumerate(words):
token = self.tokenizer.tokenize(word)
tokens.extend(token)
for i in range(len(token)):
if i == 0:
valid_positions.append(1)
else:
valid_positions.append(0)
return tokens, valid_positions

what's the 3rd i： i==0 ?
maybe change the 2nd for-loop with another iter-var.

Jeffrey031 · 2020-05-11T20:35:10Z

Hi,

The data is in the following format:
TOKEN NNP B-NP O
So after the word_tokenize in the for loop, only the first element, the token, is given a 1 for masking. This will mask everything, but not the token on the first position (see 'attention_mask' on https://huggingface.co/transformers/model_doc/bert.html#bertmodel)
The first 'i' is maybe optional, but probably there for creating an iterator with the enumerate. The second and third 'i' are probably there to reduce the number of variables.

Hope this helps.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

small error in the source code #85

small error in the source code #85

wont commented May 9, 2020 •

edited

Loading

Jeffrey031 commented May 11, 2020

small error in the source code #85

small error in the source code #85

Comments

wont commented May 9, 2020 • edited Loading

Jeffrey031 commented May 11, 2020

wont commented May 9, 2020 •

edited

Loading