Skip to content

Latest commit

 

History

History
29 lines (22 loc) · 1.93 KB

README.md

File metadata and controls

29 lines (22 loc) · 1.93 KB

Code and Data for YouTube Cross-Talk Study

We release the code and data for the following paper. If you use these datasets, or refer to our findings, please cite:

Siqi Wu and Paul Resnick. Cross-Partisan Discussions on YouTube: Conservatives Talk to Liberals but Liberals Don't Talk to Conservatives. AAAI International Conference on Weblogs and Social Media (ICWSM), 2021. [paper|slides|poster]

Data

The data is hosted on Dataverse. See more details in this data description.

Plots

Plots reported in the paper can be reproduced by the scripts in plots directory, with the aggregate video/user data we provide in data directory as input files.

Scrapers

The crawler directory contains all scripts for building our data collection pipeline. You will need to first copy conf.py to local_conf.py, then set up the Twitter and YouTube credentials.

Obtaining political leaning for seed users

The prediction directory contains all scripts for estimating the political leaning labels for seed users.

Pre-trained Hierarchical Attention Network (HAN)

The HAN model we built was modified from the inspiring code from hnatt. See this for how to use our pre-trained HAN models for predicting user political leaning given a set of comments.

Python version

The HAN module was tested in Python 2.7. Other than that, all other codes were developed and tested in Python 3.7. See more details in the requirements.txt.