-
Notifications
You must be signed in to change notification settings - Fork 126
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WikiConv example Jupyter NoteBook Code not working #96
Comments
Hey there @ErikJSchmidt, This is related to #59 - basically, the current WikiConv corpora were created with an older version of ConvoKit, and in the meantime the ConvoKit utterance API has changed in ways that make the saved modification metadata no longer compatible with the current API (specifically, the current API has utterance.id redirect to utterance._id, but there was no such distinction in ConvoKit when the modification data was computed, leading to the error that you see). The reason you see this with the specific conversation you listed, and not with other conversations, is that the conversation you found happens to contain some modification data (most conversations don't, so when randomly selecting conversations chances are the erroring code will never be triggered). We're working on constructing an updated WikiConv which will address this issue, but as noted in the linked comments this may take a while due to computational resource issues. Thankfully, you do not have to wait for an updated WikiConv if you just want to run the example code - there is a workaround that can be used to avoid this issue! What you can do is bypass the utterance API entirely by unpacking each modification object into a dict. To do this, you can simply replace all calls to
This change will allow Note that if you want to run the entire demo notebook, there is a similar issue in the function Hope this helps! From my testing this workaround allows the entire notebook to run without issue on the conversation you mentioned, but let us know if anything else comes up. And sorry for the inconvenience! |
As of 03/21/2022, the ConvoKit WikiConv corpora have been updated so the modification metadata now work correctly, so the workaround described in the last comment is no longer needed. |
Situation
Hey there, I am want to work with the WikiConv corpus and think the ConvoKit framework should make that a lot easier.
To get started I tried to follow along the example notebook. I downloaded the 2003 corpus on my machine via
corpus_dir_path_cluster = "my/path/"
wikiconv_2003 = Corpus(filename=download("wikiconv-2003", data_dir = corpus_dir_path_cluster))
And started copying the code from the notebook into a python file step by step to make it work.
Now when using
print_final_conversation
everything works fine as long as the modification, deletion and restoration lists are not used.The problem
No for example when I only put conversation with id
'1275892.3573.3573'
in therandom_conversations
list like:random_conversations = [wikiconv_2003.get_conversation('1275892.3573.3573')]
and call
print_final_conversation(random_conversations, wikiconv_2003)
then we get into function
check_lists_for_match
with param str(utterance) =Utterance(id: '1277066.3845.3845', conversation_id: 1275892.3573.3573, reply-to: 1275892.3573.3573, speaker: Speaker(id: Ruhrjung, vectors: [], meta: {'user_id': '10582'}), timestamp: 1060644108.0, text: 'Germany is no state of USA. That is established, even in Wikipedia_talk:Naming_conventions_(city_names), where virtually no arguments for the comma-notion are presented, although the debate there seems to have fallen asleep (about July 22nd). ', vectors: [], meta: {'is_section_header': False, 'indentation': '1', 'toxicity': 0.0887862, 'sever_toxicity': 0.01593855, 'ancestor_id': '1275929.3845.3845', 'rev_id': '1275929', 'parent_id': None, 'original': ({'id': '1275929.3845.3845', 'root': '1275892.3573.3573', 'reply_to': '1275892.3573.3573', 'timestamp': 1060640971.0, 'text': 'Germany is no state of USA. That is established, even in Wikipedia_talk:Naming_conventions_(city_names), where virtually no arguments for the comma-notion are presented, although the debate there seems to have fallen asleep. ', 'meta': {'is_section_header': False, 'indentation': '1', 'toxicity': 0.0887862, 'sever_toxicity': 0.01593855, 'ancestor_id': '1275929.3845.3845', 'rev_id': '1275929', 'parent_id': None, 'original': None, 'modification': [], 'deletion': [], 'restoration': []}}), 'modification': [({'id': '1277066.3845.3845', 'root': '1275892.3573.3573', 'reply_to': '1275892.3573.3573', 'timestamp': 1060644108.0, 'text': 'Germany is no state of USA. That is established, even in Wikipedia_talk:Naming_conventions_(city_names), where virtually no arguments for the comma-notion are presented, although the debate there seems to have fallen asleep (about July 22nd). ', 'meta': {'is_section_header': False, 'indentation': '1', 'toxicity': 0.06772689, 'sever_toxicity': 0.009684636, 'ancestor_id': '1275929.3845.3845', 'rev_id': '1277066', 'parent_id': '1275929.3845.3845', 'original': None, 'modification': [], 'deletion': [], 'restoration': []}})], 'deletion': [], 'restoration': []})
and therefore str(modification_list) =
Modifications[({'id': '1277066.3845.3845', 'root': '1275892.3573.3573', 'reply_to': '1275892.3573.3573', 'timestamp': 1060644108.0, 'text': 'Germany is no state of USA. That is established, even in Wikipedia_talk:Naming_conventions_(city_names), where virtually no arguments for the comma-notion are presented, although the debate there seems to have fallen asleep (about July 22nd). ', 'meta': {'is_section_header': False, 'indentation': '1', 'toxicity': 0.06772689, 'sever_toxicity': 0.009684636, 'ancestor_id': '1275929.3845.3845', 'rev_id': '1277066', 'parent_id': '1275929.3845.3845', 'original': None, 'modification': [], 'deletion': [], 'restoration': []}})]
That leads to the check
if (utterance_val.id == next_utterance_value.reply_to):
that produces
AttributeError: 'Utterance' object has no attribute '_id'
My insights so far
When looking at the modifications list I see
'id': '1277066.3845.3845'
which I guess should correspond to the utterance's _id. So I don't get why the id is said to be missing.I am not to familiar with Python, but I wonder why str(utterance) has the id printed as
id: '1277066.3845.3845'
without '' while the utterance in the modification list has'id': '1277066.3845.3845'
where id is wrapped in ''. To me it looks like the utterances in the modification list are in a json like format and I don#t know why.Also the notebokk was update last in Nov 20, 2019 while the wikiconv models where updated last on Dec 1, 2020. So maybe the notebook is outdated?
Conclusion
It would help me a lot if anyone could check if they can reproduce this AttributeError, or if this is a problem sole for me.
Thank you all for your help.
The text was updated successfully, but these errors were encountered: