This is a course project for SOCI 40133 Computational Content Analysis. With massive data and research showing that the gap between the wealthy and poor keeps enlarging in recent years, I am interested in how online media frames this phenomenon. Extracting topics and examining inequality coverage trends may offer us a new perspective on the causes and solutions of the inequality. The goal of this project is to analyze online English language news coverage of inequality by employed computational methods including topic modeling methods, word embeddings, and word and phrase frequency analysis to all news containing "inequality" in the News on the Web (NOW) corpus to see how news coverage given to inequality has changed from 2010 to 2019.
LDA topic modeling: to see if the online news coverage trend given to income inequality from 2010 to 2019
Word embeddings: to describe the general world view of the online news by checking the closeness among the words. After that, I will build several dimensions and project the news onto these dimensions to see if there's any systematic difference (or similarity) in terms of the causes and solutions.
Word and phrase frequency analysis: we did some word cloud to capture the change of salient words during 2010-2019.
News on the Web (NOW) corpus offered by the instructor