You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
For the document that contains more than 512 words, how do you split the data? I have two ideas:
For example, if a document contains 5 words: ABCDE. We assume the window size equals to 2.
It can be split into three independent documents and each document is 'AB', 'CD' and 'E', respectively. However, the problem is that these three documents are independent, which may obtain lower performance.
It can be split into several documents via sliding windows. For example, with a window size of 3 words and padding of 1 word, the document can be split into five documents and each document is 'AB', 'ABC', 'BCD', 'CDE', 'DE', respectively. For 'BCD', the B and D are padding and the target word is C.
Do you use one of the above methods or other methods?
Thank you!
The text was updated successfully, but these errors were encountered:
For the document that contains more than 512 words, how do you split the data? I have two ideas:
For example, if a document contains 5 words: ABCDE. We assume the window size equals to 2.
Do you use one of the above methods or other methods?
Thank you!
The text was updated successfully, but these errors were encountered: