Skip to content

Latest commit

 

History

History
18 lines (11 loc) · 2.01 KB

c4l-journal-proposal.md

File metadata and controls

18 lines (11 loc) · 2.01 KB

An Open-Source Strategy for Documenting Events: The Case Study of 42nd Canadian Federal Election on Twitter

This article examines the tools, approaches, collaboration, and findings of the Web Archives for Historical Research Group around the capture and analysis of about 4M tweets during the 2015 Canadian Federal Election. We hope that national libraries and other institutions will find our model useful as they consider how to archive ongoing events using Twitter.

While Twitter is not a representative sample of broader society - Pew Research notes that it skews young, college-educated, and affluent (above $50,000 household income) – Twitter still represents an exponential increase in the amount of information generated, retained, and preserved from non-elite people. Therefore, when historians study the 2015 federal election, Twitter will be a prime source.

On August 3, 2015, the team initiated both a Search API and Stream API collection with twarc using the hashtag #elxn42. Data collection ceased on November 5, 2015, the day after Justin Trudeau was sworn in as the 42nd Prime Minister of Canada. We collected for a total of 102 days, 13 hours and 50 minutes.

To analyze the data set, we took advantage of a number of utilities that are available within twarc and twarc-report, as well as jq, Mathematica, and Apache Spark Notebook. In accordance with the Twitter ToS, we made the tweet IDs and other derivative data available in a data repository.

Our analytics included:

  • breaking tweet text down by day to track change over time;
  • client analysis, allowing us to see how the scale of mobile devices affected medium interactions;
  • URL analysis, comparing both to Archive-It collections and the Wayback Availabilty API to add to our understanding of crawl completeness;
  • and image analysis, using an archive of extracted images.

Our article introduces our collecting work, the analysis we have done, and provides a framework for other collecting institutions to do similar work with our off-the-shelf open-source tools.