GitHub - rbkhb/NLP_IMC: Tutorial session on extracting information from social media data. Part of the Interacting Minds Center's NLP Workshop at Aarhus University on Nov 7, 2019.

AU Interacting Minds Centre, NLP Workshop - November 7th

Stance Detection & Topic Modelling of Social Media Users' Content

Rebekah Baglini, Luca Nannini, and Arnault-Quentin Vermillet

Link

Program

Data Preprocessing

Load Dataset
Tokenization/Stopword Removal
Clean Tweets Strings with Regular Expressions
Lemmatization/Stemming

Topic modeling

Create, Run, and Train the HDP model via Gensim
Visualize topics through an interactive graphs - pyLDAvis
Visualize cosine metrics of topics as a heatmap
HDP and LDA via Gensim Models

Supervised text classification with BERT

Datasets

Vacc_tweets_raw_n5000.csv

Random sample of 5000 (out of > 1 million) tweets from 2019 containing string 'vaccin'
Collected using GetOldTweets3 scraper

Additional sets in Data folder

5-topic_stance_tweets_training_n2814.csv

Training set from SemEval2016 Task 6 for stance detection task
Labels: FAVOR, AGAINST, UNKNOWN
Topics:
- Atheism
- Climate change is a concern
- Feminist movement
- Hillary Clinton
- Legalization of abortion

Vacc_articles_w_stance_n3303.csv

From Vaccine sentiment project, contains 3303 sentences extracted from online articles labelled pro (n=24), neg (n=22), and neu(tral) (n=7).

Vacc_tweets_w_stance_n1131.csv

1131 tweets containing string 'vaccin' labeled for stance
Labels = pro, anti, unknown

Or upload your own dataset!

Using the upload widget in the Colab file.

Code files

During the tutorial, we will be working from a Google Colab notebook. This means you will not have to install or load anything locally.
If you'd like to run locally, we've included list of dependencies in Requirements.txt

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
Data		Data
BERT_Fine_Tuning_Sentence_Classification.ipynb		BERT_Fine_Tuning_Sentence_Classification.ipynb
NLP_Workshop_Text_cleaning,_topic_modeling_and_classification.ipynb		NLP_Workshop_Text_cleaning,_topic_modeling_and_classification.ipynb
README.md		README.md
Raw_vacc_tweets_n5000.csv		Raw_vacc_tweets_n5000.csv
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AU Interacting Minds Centre, NLP Workshop - November 7th

Program

Datasets

Additional sets in Data folder

Code files

About

Releases

Packages

Languages

rbkhb/NLP_IMC

Folders and files

Latest commit

History

Repository files navigation

AU Interacting Minds Centre, NLP Workshop - November 7th

Program

Datasets

Additional sets in Data folder

Code files

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages