Skip to content

Latest commit

 

History

History
26 lines (19 loc) · 1.24 KB

data.md

File metadata and controls

26 lines (19 loc) · 1.24 KB
layout title permalink
page
Datasets
/data/

End of Term Datasets

The End of Term project is working with the Amazon Web Services' Open Data Sponsorship Program to host a copy of the 2004, 2008, 2012, 2016, and 2020 End of Term Datasets.

The work of inventorying, staging and moving the data into AWS is still ongoing and more information will be provided here in the future.

Currently we have these datasets partially available for use.

Dataset WARC # WARC Size
Compressed
EOT-2020 239811 266.04 TB
EOT-2016 194683 139.3 TB
EOT-2012 78509 41.42 TB
EOT-2008 125704 15.32 TB
EOT-2004 58977 6.42 TB

End of Term Web Crawls Collection

Additionally, crawl data is available from the Internet Archive via the End of Term Web Crawls collection.