Skip to content

Latest commit

 

History

History
29 lines (21 loc) · 2.37 KB

data-2004.md

File metadata and controls

29 lines (21 loc) · 2.37 KB
layout title permalink
page
End of Term 2004 Dataset
/data/data-2004/

End of Term 2004 Dataset

The End of Term 2004 Dataset represents data collected by the National Archives and Records Administration (NARA) as part of their 2004 Presidential Term web archive. This dataset was identified in the Internet Archives' collection and has been included as part of the End of Term Presidential Web Archive.

Archive Location and Download

The 2004 End of Term archive is located on the eotarchive bucket at EOT-2004.

To assist with exploring and using the dataset, we provide gzipped files which list all segments, WARC, WAT, WET, and CDX files.

By adding either s3://eotarchive/ or https://eotarchive.s3.amazonaws.com/ to each line, you end up with the s3 and HTTP paths respectively.

File List #Files Total Size
Compressed
Segments EOT-2004/segment.paths.gz 6
WARC files EOT-2004/warc.paths.gz 58977 6.42 TB
WAT files EOT-2004/wat.paths.gz 58977 108.34 GB
WET files EOT-2004/wet.paths.gz 58977 18.21 MB
META files EOT-2004/meta.paths.gz 58977 36.27 GB
CDX files EOT-2004/cdx.paths.gz 58977 5.82 GB
URL Index files EOT-2004/eot-index.paths.gz 49 4.5 GB