You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I want build a wordcloud from news articles, and in those people and places might have more than one word to describe them (eg "New York" or "Ursula Von Der Leyen".
With the current tokenization New becomes "indipendent" from "York", same with the parts of a name.
Is there a way to represent these "groups" so they stay together?
For example I could imagine a format as:
"Rome London 'New York' Biden 'Rishi Sunak' Mumbai"
where the single quote would mean "keep the string together".
Thanks for any ideas and thank you SO MUCH for this wonderful library !!!!
The text was updated successfully, but these errors were encountered:
I want build a wordcloud from news articles, and in those people and places might have more than one word to describe them (eg "New York" or "Ursula Von Der Leyen".
With the current tokenization New becomes "indipendent" from "York", same with the parts of a name.
Is there a way to represent these "groups" so they stay together?
For example I could imagine a format as:
"Rome London 'New York' Biden 'Rishi Sunak' Mumbai"
where the single quote would mean "keep the string together".
Thanks for any ideas and thank you SO MUCH for this wonderful library !!!!
The text was updated successfully, but these errors were encountered: