Skip to content

Latest commit

 

History

History

1_pdf_to_csv

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 

Disbursement Converter

Converts House Disbursement PDFs into detail and summary CSV files.

Source PDFs: http://disbursements.house.gov/ Resulting Data: https://sunlightfoundation.com/projects/expenditures/

Originally authored by Luke Rosiak with improvements by James Turk for the Sunlight Foundation.

Use

  1. Visit disbursements.house.gov and download a single volume PDF like this one:

http://disbursements.house.gov/2013q4/2013q4_singlevolume.pdf

  1. Take a single volume PDF and cut it to just the disbursement pages:
pdftk 2010q1_singlevolume.pdf cat 9-2342 output 2013q4-disbursements-only.pdf
  1. Extract the text from this disbursements-only PDF:
pdftotext -layout 2013q4-disbursements-only.pdf
  1. Run this script on that text file:
python parse-disbursements.py 2013q4-disbursements-only.txt

This script depends on the .txt filename's first six characters representing the quarter it covers.