You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The issue arises due to tuple(tok_encoding.word_to_tokens(wi)) for wi in range(len(tokens)): when a token is not included in the input due to truncation, word_to_tokens returns None, and tuple(None) raises a TypeError, which triggers the catch condition and makes the function return None, which cannot be unpacked in input_ids, token_offsets = self.tokenize(token_sample.tokens), resulting in another unhandled exception that finally crashes classy.
To Reproduce
In the token classification setting, input a sentence that has too many tokens (or reduce truncation to obtain the same effect).
Expected behaviour
I think there is a way to know how many of the original tokens were kept, and we can iterate over them instead of len(tokens), otherwise we can just iterate until word_to_tokens(wi) is not None. Comments?
The text was updated successfully, but these errors were encountered:
Describe the bug
Title. In classy/data/dataset/hf/classification.py#L89 we invoke
self.tokenize
(#L109) which correctly truncates the input.The issue arises due to
tuple(tok_encoding.word_to_tokens(wi)) for wi in range(len(tokens))
: when a token is not included in the input due to truncation,word_to_tokens
returnsNone
, andtuple(None)
raises aTypeError
, which triggers the catch condition and makes the function returnNone
, which cannot be unpacked ininput_ids, token_offsets = self.tokenize(token_sample.tokens)
, resulting in another unhandled exception that finally crashesclassy
.To Reproduce
In the token classification setting, input a sentence that has too many tokens (or reduce truncation to obtain the same effect).
Expected behaviour
I think there is a way to know how many of the original tokens were kept, and we can iterate over them instead of
len(tokens)
, otherwise we can just iterate untilword_to_tokens(wi)
is notNone
. Comments?The text was updated successfully, but these errors were encountered: