GitHub - rbagd/fertility-us: Automating treatment of US natality data

This repository contains full scripts to automate the process of constructing fertility density curves for US data. The worklow is as follows:

download data in archived files and unzip those; the unarchived yearly data files are quite large and can go up to 5.8 Gb
optionally, download documentation files to understand data layout
extract only the columns of interest from yearly data files for further processing
construct an SQLite database with data from the extracted columns
do statistical analysis on the data by querying the SQL database

First three steps use only bash scripting. They are separated since each of the steps may be time consuming. The last two steps use Python and R respectively. Construction of an SQL database may take a few hours.

To reconstruct the data, one has to run the following commands.

chmod +x {download,unzip,extract}
./download && ./unzip && ./extract
python preprocessed/data.py
Rscript preprocessed/data.R

Be aware that some extra dependencies are necessary, notably sqlite module for Python as well as RSQLite library for R. Some additional libraries for R are useful to treat raw data and obtain plots.

Plurality data for 1969 and 1970, i.e. number of children at birth, are not available. SQLite database is therefore constructed from 1971 only.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
preprocessed		preprocessed
README.md		README.md
documentation		documentation
download		download
extract		extract
unzip		unzip

rbagd/fertility-us

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages