You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In texar-pytorch/bin/utils, sentencepiece is a package instead of a tokenization method, so sentencepiece encoding is not a very accurate way to describe the tokenization method. sentencepiece includes two sub-word tokenization methods: byte-pair-encoding (BPE)[Sennrich et al.] and unigram language model [Kudo.].
There is no Word Piece Model (WPM) pipeline in sentencepiece, and the code we used
In
texar-pytorch/bin/utils
,sentencepiece
is a package instead of a tokenization method, sosentencepiece
encoding is not a very accurate way to describe the tokenization method.sentencepiece
includes two sub-word tokenization methods: byte-pair-encoding (BPE)[Sennrich et al.] and unigram language model [Kudo.].There is no
Word Piece Model (WPM) pipeline
insentencepiece
, and the code we usedis actually using
unigram language model
. Here, Unigram is the default method.transformer
example need to be updated accordingly.The text was updated successfully, but these errors were encountered: