You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I was wondering if the preprocess function could be enhanced as right now, it strips punctuations before and after usernames/URLs. Or was it done on purpose? I couldn't find a justification of this in your paper.
Right now, the preprocess function below would convert:
# Preprocess text (username and link placeholders)defpreprocess(text):
new_text= []
fortintext.split(" "):
t='@user'ift.startswith('@') andlen(t) >1elsett='http'ift.startswith('http') elsetnew_text.append(t)
return" ".join(new_text)
It seems to me that punctuations could help the model predict the sentiment of a tweet a little better if it was available to it. Another example: some users on twitter, start their tweets with a dot like this:
They do that to avoid the reply system while still quoting a username. With the actual pre-processing function, "@rudy" doesn't get replaced because there is a dot right before the @.
Is there any particular reason why the preprocessing function was done this way or we could try to make it more flexible in our end by keeping the punctuations next to usernames or URLs?
Thank you!
The text was updated successfully, but these errors were encountered:
Hello,
I was wondering if the
preprocess
function could be enhanced as right now, it strips punctuations before and after usernames/URLs. Or was it done on purpose? I couldn't find a justification of this in your paper.Right now, the
preprocess
function below would convert:to
It seems to me that punctuations could help the model predict the sentiment of a tweet a little better if it was available to it. Another example: some users on twitter, start their tweets with a dot like this:
They do that to avoid the reply system while still quoting a username. With the actual pre-processing function, "@rudy" doesn't get replaced because there is a dot right before the @.
Is there any particular reason why the preprocessing function was done this way or we could try to make it more flexible in our end by keeping the punctuations next to usernames or URLs?
Thank you!
The text was updated successfully, but these errors were encountered: