You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thanks for making VADER. I'm working on another port and am having a blast.
There are instances of words/emojis that have two entries with different sentiment values in the most recent version of vader_lexicon.txt. This is a potential source of bugs and inconsistencies between ports. I've included the list below with the line number in vader_lexicon.txt, the words, and the sentiment values.
It looks the Python version of VADER takes the last value it finds. For example, "lol" has two sentiment values: +2.9 at line 305, and+1.8 at line 4406. To reproduce the output in test sentence 13 from the main Readme (copied below), I need to assign "lol" a sentiment of 1.8.
Today only kinda sux! But I'll get by, lol----------------------- {'pos': 0.317, 'compound': 0.5249, 'neu': 0.556, 'neg': 0.127}
I see three main options:
Leave it as-is. This seems least desirable, since it leads to unpredictable and potentially inconsistent behaviour across instantiations.
Update the dictionary to match the current behaviour by removing each second instance of the 14 words below. This would be easy, but the potential downside is that some of the differences are big: e.g. "d:" has a positive instance and a negative instance, and "sob"'s larger value is more than double the smaller value.
Update the dictionary to match your intuition. A case-by-case approach wouldn't take long since there are only 14 instances, and a standard approach (e.g. averaging the two values) would also be simple.
Obviously it's your call, but I didn't see this in any other Issues or Pull Requests so I wanted to surface it. I'm happy to chat or help in any way I can.
line number
word
sentiment
120
:-p
1.2
124
:-p
1.5
227
d:
-2.9
1740
d:
1.2
230
d=
-3
1741
d=
1.5
234
fav
2.4
2831
fav
2
301
lmao
2
4399
lmao
2.9
305
lol
2.9
4406
lol
1.8
320
muah
2.8
4730
muah
2.3
342
o.o
-0.6
4853
o.o
-0.8
352
ok
1.6
4895
ok
1.2
385
sob
-2.8
6188
sob
-1
411
x-d
2.7
7489
x-d
2.6
412
x-p
1.8
7490
x-p
1.7
413
xd
2.7
7491
xd
2.8
417
xp
1.2
7492
xp
1.6
The text was updated successfully, but these errors were encountered:
This issue stumped me as well during the development of my own port. There are even more duplicates, like
line no.
element
sentiment
342
o.o
-0.6
4853
o.o
-0.8
I worked around the issue by replacing existing mappings by subsequent entries, thus keeping the original lexicon intact. However, as you mentioned, this does not seem like a sustainable solution. I would really appreciate a follow-up from @cjhutto or any of the other co-authors as to what would be the most appropriate permanent option.
Thanks for making VADER. I'm working on another port and am having a blast.
There are instances of words/emojis that have two entries with different sentiment values in the most recent version of
vader_lexicon.txt
. This is a potential source of bugs and inconsistencies between ports. I've included the list below with the line number in vader_lexicon.txt, the words, and the sentiment values.It looks the Python version of VADER takes the last value it finds. For example, "lol" has two sentiment values: +2.9 at line 305, and+1.8 at line 4406. To reproduce the output in test sentence 13 from the main Readme (copied below), I need to assign "lol" a sentiment of 1.8.
Today only kinda sux! But I'll get by, lol----------------------- {'pos': 0.317, 'compound': 0.5249, 'neu': 0.556, 'neg': 0.127}
I see three main options:
Obviously it's your call, but I didn't see this in any other Issues or Pull Requests so I wanted to surface it. I'm happy to chat or help in any way I can.
The text was updated successfully, but these errors were encountered: