-
Notifications
You must be signed in to change notification settings - Fork 3
/
README.Rmd
118 lines (73 loc) · 4.79 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
---
output: github_document
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
```
# CongressData: A Functional Tool for the CongressData Dataset<img src="figures/CongressData.png" height="150" align="right"/>
`CongressData` is a package designed to allow a user with only basic knowledge of R interact with **CongressData**, a dataset with nearly 800 variables that compiles information about all US congressional districts across 1789-2023, and its codebook. The dataset tracks district characteristics, members of congress, and the behavior of those members in policymaking. Users can find variables related to demographics, politics, and policy; subset the data across multiple dimensions; create custom aggregations of the dataset; and access citations in both plain text and BibTeX for every variable.
## Installing this Package
`CongressData` is a functional package that interacts with the CongressData dataset via the internet. Install the package from GitHub like so:
```{r eval=F}
# use the devtools library to download the package from GitHub
library(devtools)
# if there are issues or you only want to download CongressData
install_github("ippsr/CongressData")
```
## Finding Variables
`get_var_info`: Retrieve information regarding variables in CongressData and identify variables of interest with `get_var_info`. The function allows you to search to codebook to find the years each variable is observed in the data; a short and long description of each variable; and the source and citation/s for each variable. Citations are available in both bibtex and plain text. Use the function to search for broad terms like 'tax' with the `related_to` argument and/or partial-match variable names with `var_names`.
```{r}
suppressMessages(library(dplyr))
library(CongressData)
# variables related to health insurance
h_ins_cong <- get_var_info(related_to = "health insurance")
cat("There are",nrow(h_ins_cong),"variables related to health insurance in CongressData")
head(h_ins_cong$variable)
# variables with 'under18' in their name
under18_cong <- get_var_info(var_names = "under18")
head(under18_cong$variable)
```
`get_var_info` returns the following information to simplify using CongressData:
- variable: Variable name
- year: The precise years the variable is observed
- short_desc: A short description of the variable
- long_desc: A long description of the variable
- source: The sources of the data
- category: the variable's category (not all are coded)
- plaintext_cite[1-4]: Plain text citation(s) for the data
- bibtext_cite[1-4]: BibTeX citation(s) for the data
## Accessing CongressData
`get_cong_data`: Access all or a part of CongressData with `get_cong_data`. Subset by state names with `state` and years with `years` (either a single year or a two-year vector that represents the min/max of what you want). You can also use the `related_to` argument to search across variable names, short/long descriptions from the codebook, and citations for non-exact matches of a supplied term. For example, searching 'tax' will return variables with words like 'taxes' and 'taxable' in any of those columns.
```{r}
# load the entire dataset
all_the_dat <- get_cong_data()
# subset by state, topic, and years
cong_subset <- get_cong_data(states = c("Indiana","Kentucky","Michigan")
,related_to = "tax"
,years = c(1960,1980))
```
Run `get_congress_version` to see what version of the dataset is available in `CongressData`.
```{r}
CongressData::get_congress_version()
```
## Pulling Citations
`get_var_info`: Each variable in CongressData was collected from external sources, please use `get_var_info` to obtain their citations (plain text and BibTeX). We've made it easy to cite the source of each variable you use with the `get_var_info` function described above. Supply a vector of variable names to the function with the `var_names` function and collect the citations provided in the plain text or BibTeX columns. NOTE: Some variables have multiple citations, so do check you have them all.
```{r}
# bibtex is also available
get_var_info(var_names = "com_benghazi_299") %>%
pull(plaintext_cite)
# bibtex is also available
get_var_info(var_names = "percent_bus") %>%
pull(plaintext_cite)
```
## Dataset and Package Citation
In addition to citing each variable's source, we ask that you cite CongressData if use this package or the dataset. A recommended citation is below.
> Grossmann, M., Lucas, C., McCrain, J, & Ostrander, I. (2022). CongressData. East Lansing, MI: Institute for Public Policy and Social Research (IPPSR)
## Contact
For questions about the CongressData dataset, contact Ben Yoel ([email protected]).