-
Notifications
You must be signed in to change notification settings - Fork 4
/
Copy pathappendix.dyndoc
218 lines (163 loc) · 3.56 KB
/
appendix.dyndoc
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
# Appendix {.unnumbered}
## Solutions {.unnumbered}
### Exercise 1 Solution {.unnumbered}
[Exercise 1](02-working-with-data-sets.qmd#exercise-1)
1:
```stata
<<dd_do>>
sysuse lifeexp
<</dd_do>>
```
3:
```stata
<<dd_do>>
sysuse sandstone, clear
<</dd_do>>
```
or
```stata
<<dd_do>>
clear
sysuse sandstone
<</dd_do>>
```
4:
If the working directory isn't convenient, change it using the dialogue box.
Then, `save sandstone`.
### Exercise 2 Solution {.unnumbered}
[Exercise 2](03-data-management.qmd#exercise-2)
1:
```stata
<<dd_do>>
webuse census9, clear
<</dd_do>>
```
2: Data is from the 1980 census, which we can see by the label (visible in
`describe, simple`). We've got two identifiers of state, death rate, population,
median age and census region.
3: Since there data has 50 rows, it's a good guess there are no missing sates.
4: Looking at `describe`, we see that the two state identifiers are strings, the
rest numeric.
5: `compress`. Nothing saved! Because Stata already did it before posting it!
### Exercise 3 Solution {.unnumbered}
[Exercise 3](03-data-management.qmd#exercise-3)
1:
```stata
<<dd_do>>
webuse census9, clear
save mycensus9, replace
<</dd_do>>
```
The `replace` option is added in case I run it twice.
2:
```stata
<<dd_do>>
rename drate deathrate
label variable deathrate "Death rate per 10,000"
<</dd_do>>
```
3:
```stata
<<dd_do>>
tab region
label list cenreg
label define region_label 1 "Northeast" 2 "North Central" 3 "South" 4 "West"
label values region region_label
label drop cenreg
label list
tab region
<</dd_do>>
```
4:
```stata
<<dd_do>>
save, replace
<</dd_do>>
```
### Exercise 4 Solution {.unnumbered}
[Exercise 4](03-data-management.qmd#exercise-4)
```stata
<<dd_do>>
use mycensus9, clear
<</dd_do>>
```
1: Use `summarize` and `codebook` to take a look at the mean/max/min. No errors
detected.
2:
```stata
<<dd_do>>
codebook, compact
<</dd_do>>
```
`deathrate` and `medage` both have less than 50 unique values. This is due to
both being heavily rounded. If we saw more precision, there would be more unique
entires.
3:
```stata
<<dd_do>>
codebook, problems
<</dd_do>>
```
This just flags spaces (` `) in the data. Not a real problem!
### Exercise 5 Solution {.unnumbered}
[Exercise 5](04-data-manipulation.qmd#exercise-5)
```stata
<<dd_do>>
use mycensus9, clear
<</dd_do>>
```
1:
```stata
<<dd_do>>
generate deathperc = deathrate/10000
label variable deathperc "Percentage of population deceeased in 1980"
list death* in 1/5
<</dd_do>>
```
2:
```stata
<<dd_do>>
generate agecat = 1 if medage < .
replace agecat = 2 if medage > 26.2 & medage <= 30.1
replace agecat = 3 if medage > 30.1 & medage <= 32.8
replace agecat = 4 if medage > 32.8
label define agecat_label 1 "Significantly below national average" ///
2 "Below national average" ///
3 "Above national average" ///
4 "Significantly above national average"
label values agecat agecat_label
tab agecat, mi
<</dd_do>>
```
We have no missing data (seen with `summarize` and `codebook` in the previous
exercise) but it's good practice to check for them anyways.
3:
```stata
<<dd_do>>
bysort agecat: summarize deathrate
<</dd_do>>
```
We see that the groups with the higher median age tend to have higher
deathrates.
4:
```stata
<<dd_do>>
preserve
gsort -deathrate
list state deathrate in 1
gsort +deathrate
list state deathrate in 1
gsort -medage
list state medage in 1
gsort +medage
list state medage in 1
restore
<</dd_do>>
```
5:
```stata
<<dd_do>>
encode state2, gen(statecodes)
codebook statecodes
<</dd_do>>
```