This repository contains historical SF housing data and R scripts to graph that data. The data here was used to generate the graphs and analysis in the blog post "Employment, construction, and the cost of San Francisco apartments", and was recently used in a paper by Stanford researchers, "The Effects of Rent Control Expansion on Tenants, Landlords, and Inequality: Evidence from San Francisco.".
Data for each year lives in the file named after the year. Later years may be listed as "craigslist-X".
You can get the rent out by running ./extract-craigslist craigslist-2016
for
example. Note the data is not perfect. Here are some samples in the 2016
Craigslist data, for example.
799000 Apr 29 Exceptional Pacific Heights TIC $799000 / 2br - (Pacific Heights) pic
800 Apr 29 Awesome 5 Bedroom Available $800 / 5br - 3895ft2 - (2483 N Smiderle, San Bernardino, CA) pic
99 Apr 29 Jr. 1 BD. Washer & Dryer in unit! $99 deposit $3425 / 1br - 550ft2 - (nob hill) pic map
(It's not clear if these prices have been stripped before generating the
averages in the housing-inventory
file).
You can combine a bunch of data sources by running the "combine" script,
./combine
. This generates the combined
file in this repository.
The charts in the blog post are generated by running the model
script in this
repository, on the combined
data.
calc-medians
computes the medians for each year in the file. It prints the
median, 95th, and 5th percentile for each year in the dataset. These values are
present in the medians
file in this repository.
To get the Craigslist data, open the SF rentals page, select all and copy/paste the page's contents into a text file. Keep copying every page into the same text file until done. Save this file as craigslist-YYYY-MM.
All Craigslist files should be combined into one per year, via eg:
cat craigslist-2019-* > craigslist-2019
After pulling in new data, recalculate the medians:
./calc-medians > medians