This repository contains example data sets used in Data Science for Public Policy
File Name | Format | Description | Source | Chapters |
doe_ny.xlsx | Excel | Daily New York Harbor Conventional Gasoline Regular Spot Price FOB (Dollars per Gallon) - Jan 2, 2014 through Oct 23, 2017 | US Energy Information Administration | 4 |
doe_usgulf.csv | CSV | U.S. Gulf Coast Conventional Gasoline Regular Spot Price FOB (Dollars per Gallon) - Jan 2, 2014 through Oct 23, 2017 | US Energy Information Administration | 4 |
chicago_crime.Rda | Rda | Chicago crime data (2015 to 2018) | City of Chicago | 4 |
watch_lists.Rda | Rda | Watch/sanction lists containing names of entities. | United Nations, US Department of Commerce, European Commission and United Kingdom | 5 |
access_exercise.csv | CSV | Net acceleration data from a sample of exercise activities. | Jeffrey Chen (Author) | 6 |
county_compare.Rda | Rda | Assortment of economic characteristics (e.g. education attainment, employment market, tech sector size) for US counties in 2016 | US Census County Business Patterns, US Census Small Area Income and Poverty Estimates (SAIPE) | 6, 11 |
polb.csv | CSV | Port of Long Beach shipping container volume (1996 to 2019) | Port of Long Beach | 6 |
econ_vintage.Rda | RDA | Real-Time Data Set for Macroeconomists -- collection of quarterly economic series based on first release (1962 to 2019) | Federal Reserve Bank of Philadelphia | 6 |
home_sales_nyc.csv | CSV | Random sample of New York City one family home sales (2017 to 2018) | New York City Department of Finance | 7 |
ukhpi.csv | CSV | UK Housing Price Index (2010 to 2019) for all units -- sales volume and average price | HM Land Registry | 7 |
did_data.Rda | Rda | Simulated difference-in-differences time series -- See simulated-data repository for code |
Gary Cornwall (Author) | 8 |
coindesk.csv | CSV | Bitcoin prices (2010 to 2017) | Coindesk | 8 |
ntlights.Rda | RDA | Median night time light measures from DSMP for Indian state of Uttarakhand | NOAA/USAF DSMP, University of Michigan, World Bank | 8 |
acs_health.Rda | RDA | U.S. Census Bureau's ACS sample for Georgia -- 2015 | U.S. Census Bureau American Community Survey 2015 | 9 |
acs_health_expanded.Rda | RDA | U.S. Census Bureau's ACS sample for Georgia -- 2015. Discrete variables are expanded into one hot encoding (dummy variable matrix). | U.S. Census Bureau American Community Survey 2015 | 9 |
cropscape.png | PNG | Raster of crop fields from NASA MODIS imagery, developed for USDA Cropscape project. | U.S. Department of Agriculture | 10 |
sandy_trees.csv | CSV | Location of likely downed trees after Hurricane Sandy, derived from 311 call records converted into a grid containing 1000ft x 1000 ft cells. | Derived from New York City 311 | 10 |
wages.Rda | Rda | Random sample of n= 3000 labor force participants in California over 18 years old who earned any wages (2016). |
U.S. Census Bureau | 10 |
qcew_cali_sa.Rda | Rda | Quarterly employment count for California counties from the Quarterly Census of Employment and Wages (QCEW) - 1992 to 2016. Sample has been seasonally adjusted using STL decomposition. | U.S. Bureau of Labor Statistics | 11 |
modis-fires.csv | CSV | Location of active fires detected using NASA MODIS imagery (Feb 10 to 17, 2020) | NASA Earth Science | 12 |
chicago-crime-2018.csv | CSV | Chicago crime data (2018) | City of Chicago]( | 12 |
chicago-police-districts | SHP | Shapefile of Chicago Police Districts | City of Chicago | 12 |
chicago-police-stations.csv | CSV | Location of Chicago Police Stations | City of Chicago | 12 |
deficit-articles.csv | Rda | Collection of news articles about the federal budget deficit | Associated Press, NYTimes, Market Watch, The Hill and CNBC | 13 |
wiki-article.txt | TXT | 1973 Oil Crisis excerpt | Wikipedia | 13 |
sotu.csv | CSV | State of the Union speeches through 2019. Speeches were scraped from WikiSource. | | 13 |