Skip to content

Identifying bias in the media with sentiment analysis: a case study.

License

Notifications You must be signed in to change notification settings

jphalip/media-bias

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 

Repository files navigation

This repository contains all the code I wrote to support a case study published on my blog: http://julienphalip.com/post/156908898512/identifying-bias-in-the-media-with-sentiment

The study aims to evaluate bias in the media using sentiment analysis of video titles published by some prominent American TV channels on their Youtube accounts.

Setup

A bit of setting up is required before you can run this code.

Google API key

First, you need to get an API key from Google by following the steps described here: https://developers.google.com/api-client-library/python/guide/aaa_apikeys

This key will be used for two services:

  • Google Cloud Natural Language API
  • YouTube Data API v3

Once you've acquired a key, save it into a file named google-api-key.txt at the root of this repository.

Python environment

The following Python packages need to be installed in your Python environment:

ipython==5.1.0
pandas==0.19.1
google-api-python-client==1.5.5
unicodecsv==0.14.1

Acquiring data

Four types of datasets must be generated: channels, topics, videos and sentiment scores.

Channels

Create a channels.csv file using the structure detailed in this example:

channels = pandas.DataFrame.from_records([
    {'title': 'Fox News', 'slug': 'fox-news', 'youtube_id': 'UCXIJgqnII2ZOINSWNOGFThA', 'playlist_id': 'UUXIJgqnII2ZOINSWNOGFThA', 'url': 'https://www.youtube.com/user/FoxNewsChannel', 'color': '#5975a4'},
    {'title': 'CNN', 'slug': 'cnn', 'youtube_id': 'UCupvZG-5ko_eiXAupbDfxWw', 'playlist_id': 'UUupvZG-5ko_eiXAupbDfxWw', 'url': 'https://www.youtube.com/user/CNN', 'color': '#b55d60'},
    {'title': 'MSNBC', 'slug': 'msnbc', 'youtube_id': 'UCaXkIU1QidjPwiAYu6GcHjg', 'playlist_id': 'UUaXkIU1QidjPwiAYu6GcHjg', 'url': 'https://www.youtube.com/user/msnbcleanforward', 'color': '#5f9e6e'},
    {'title': 'CBS News', 'slug': 'cbs-news', 'youtube_id': 'UC8p1vwvWtl6T73JiExfWs1g', 'playlist_id': 'UU8p1vwvWtl6T73JiExfWs1g', 'url': 'https://www.youtube.com/user/CBSNewsOnline', 'color': '#666666'},
])

channels.to_csv('channels.csv', index=False, encoding='utf-8')

The youtube_id is the channel's unique Youtube ID. Finding out a channel's ID is a little tricky:

  • Go to the channel's page (e.g. https://www.youtube.com/user/CNN)
  • View the HTML source of the page.
  • Look for "data-channel-external-id" in the HTML source. The value associated with it is the channel's Youtube ID.

The playlist_id corresponds to a channel's default playlist where all its videos are published. To retrieve a channel's playlist_id:

Topics

Create a topics.csv file using the structure detailed in this example:

topics = pandas.DataFrame.from_records([
    {'title': 'Obama', 'slug': 'obama', 'variant1': 'Obama', 'variant2': 'Obamas'},
    {'title': 'Clinton', 'slug': 'clinton','variant1': 'Clinton', 'variant2': 'Clintons'},
    {'title': 'Trump', 'slug': 'trump','variant1': 'Trump', 'variant2': 'Trumps'},
    {'title': 'Democrats', 'slug': 'democrats', 'variant1': 'Democrat', 'variant2': 'Democrats'},
    {'title': 'Republicans', 'slug': 'republicans', 'variant1': 'Republican', 'variant2': 'Republicans'},
    {'title': 'Liberals', 'slug': 'liberals', 'variant1': 'Liberal', 'variant2': 'Liberals'},
    {'title': 'Conservatives', 'slug': 'conservatives', 'variant1': 'Conservative', 'variant2': 'Conservatives'},
])

topics.to_csv('topics.csv', index=False, encoding='utf-8')

The variants are the different terms that will be searched for in the video titles in order to match videos with your topics of choice.

Videos

Run the following snippets of code in order to download all the video metadata from Youtube for your channels of choice:

First, this will download all video information and create a separate CSV file for each channel (e.g. videos-cnn.csv):

from code.youtube_api import download_channels_videos

download_channels_videos(channels)

Second, this will merge all the CSV files generated above into a single videos-MERGED.csv file.

from code.youtube_api import merge_channel_videos
merge_channel_videos(channels)

Lastly, this will create extra columns for each topic:

from code.utils import create_topic_columns

videos = pd.read_csv('videos-MERGED.csv')
create_topic_columns(videos, topics)
videos.to_csv('videos.csv', index=False, encoding='utf-8')

You now have a videos.csv file containing all the video metadata for all channels.

Sentiment scores

The last step is to download sentiment scores from the Google Natural Language API. Note that this API is not free. Make sure to first refer to the API's pricing page for adequate budgeting.

Run the following:

from code.language_api import download_sentiments

download_sentiments(videos)

You now have a sentiments.csv file containing the sentiment scores for all relevant videos.

Exploring and analysing the data

Check out my blog post for some inspiration on how to explore and analyze the data: http://julienphalip.com/post/156908898512/identifying-bias-in-the-media-with-sentiment

About

Identifying bias in the media with sentiment analysis: a case study.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages