You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In Assignment 1.3 it is written: "This will load the data in a bag-of-words representation where rare words (occurring less than 5 times in the training data) are removed". However, when I sum the word occurrences using the provided training dataset with
scr = srs.SentimentCorpus("books")
I get words, which doesn't appear at all (occurring less than 5 times)
>> scr.train_X.sum(0)
[..., 0.0, ...]
The text was updated successfully, but these errors were encountered:
Yes, the whole corpus (training + dev) is used to discard rare words. This is because training and dev are not separated until after this filtering is performed.
In Assignment 1.3 it is written: "This will load the data in a bag-of-words representation where rare words (occurring less than 5 times in the training data) are removed". However, when I sum the word occurrences using the provided training dataset with
I get words, which doesn't appear at all (occurring less than 5 times)
The text was updated successfully, but these errors were encountered: