-
-
Notifications
You must be signed in to change notification settings - Fork 60
/
98-answers.Rmd
738 lines (486 loc) · 22.6 KB
/
98-answers.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
# Solutions
```{r, echo = FALSE}
knitr::opts_chunk$set(collapse = TRUE)
library(yarrr)
```
## Chapter 4: The Basics
2. Which (if any) of the following objects names is/are invalid?
```{r, eval = FALSE}
thisone <- 1
THISONE <- 2
1This <- 3 # this one starts with a number
this.one <- 3
This.1 <- 4
ThIS.....ON...E <- 5
This!One! <- 6 # this one has invalid symbols
lkjasdfkjsdf <- 7
```
3. 2015 was a good year for pirate booty - your ship collected 100,000 gold coins. Create an object called \texttt{gold.in.2015} and assign the correct value to it.
```{r}
gold.in.2015 <- 100800
```
4. Oops, during the last inspection we discovered that one of your pirates Skippy McGee hid 800 gold coins in his underwear. Go ahead and add those gold coins to the object `gold.in.2015`. Next, create an object called `plank.list` with the name of the pirate thief.
```{r}
gold.in.2015 <- gold.in.2015 + 800
plank.list <- "Skippy McGee"
```
5. Look at the code below. What will R return after the third line? Make a prediction, then test the code yourself.
```{r, eval = FALSE}
a <- 10
a + 10
a # It will return 10 because we never re-assigned a!
```
## Chapter 5: Scalers and vectors
1. Create the vector [1, 2, 3, 4, 5, 6, 7, 8, 9, 10] in three ways: once using `c()`, once using `a:b`, and once using `seq()`.
```{r}
c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
1:10
seq(from = 1, to = 10, by = 1)
```
2. Create the vector [2.1, 4.1, 6.1, 8.1] in two ways, once using `c()` and once using `seq()`
```{r}
c(2.1, 6.1, 6.1, 8.1)
seq(from = 2.1, to = 8.1, by = 2)
```
3. Create the vector [0, 5, 10, 15] in 3 ways: using `c()`, `seq()` with a `by` argument, and `seq()` with a `length.out` argument.
```{r}
c(0, 5, 10, 15)
seq(from = 0, to = 15, by = 5)
seq(from = 0, to = 15, length.out = 4)
```
4. Create the vector [101, 102, 103, 200, 205, 210, 1000, 1100, 1200] using a combination of the `c()` and `seq()` functions
```{r}
c(seq(from = 101, to = 103, by = 3),
seq(from = 200, to = 210, by = 5),
seq(from = 1000, to = 1200, by = 100))
```
5. A new batch of 100 pirates are boarding your ship and need new swords. You have 10 scimitars, 40 broadswords, and 50 cutlasses that you need to distribute evenly to the 100 pirates as they board. Create a vector of length 100 where there is 1 scimitar, 4 broadswords, and 5 cutlasses in each group of 10. That is, in the first 10 elements there should be exactly 1 scimitar, 4 broadswords and 5 cutlasses. The next 10 elements should also have the same number of each sword (and so on).
```{r}
swords <- rep(c("scimitar", rep("broadswoard", 4), rep("cutlass", 5)), times = 100)
head(swords)
```
6. Create a vector that repeats the integers from 1 to 5, 10 times. That is [1, 2, 3, 4, 5, 1, 2, 3, 4, 5, ...]. The length of the vector should be 50!
```{r}
rep(1:5, times = 10)
```
7. Now, create the same vector as before, but this time repeat 1, 10 times, then 2, 10 times, etc., That is [1, 1, 1, ..., 2, 2, 2, ..., ... 5, 5, 5]. The length of the vector should also be 50
```{r}
rep(1:5, each = 10)
```
8. Create a vector containing 50 samples from a Normal distribution with a population mean of 20 and standard deviation of 2.
```{r}
rnorm(n = 50, mean = 20, sd = 2)
```
9. Create a vector containing 25 samples from a Uniform distribution with a lower bound of -100 and an upper bound of -50.
```{r}
runif(n = 25, min = -100, max = -50)
```
## Chapter 6: Vector Functions
1. Create a vector that shows the square root of the integers from 1 to 10.
```{r}
(1:10) ^ .5
#or
sqrt(1:10)
```
2. Renata thinks that she finds more treasure when she's had a mug of grogg than when she doesn't. To test this, she recorded how much treasure she found over 7 days without drinking any grogg (ie., sober), and then did the same over 7 days while drinking grogg (ie., drunk). Here are her results:
```{r, echo = FALSE}
renata.score <- data.frame(
day = c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday", "Sunday"),
sober = c(2, 0, 3, 1, 0, 3, 5),
drunk = c(0, 0, 1, 0, 1, 2, 2))
knitr::kable(renata.score, caption = "Renata's treasure haul when she was sober and when she was drunk")
```
How much treasure did Renata find on average when she was sober? What about when she was drunk?
```{r}
sober <- c(2, 0, 3, 1, 0, 3, 5)
drunk <- c(0, 0, 1, 0, 1, 2, 2)
mean(sober)
mean(drunk)
```
3. Using Renata's data again, create a new vector called `difference` that shows how much more treasure Renata found when she was drunk and when she was not. What was the mean, median, and standard deviation of the difference?
```{r}
difference <- sober - drunk
mean(difference)
median(difference)
sd(difference)
```
4. There's an old parable that goes something like this. A man does some work for a king and needs to be paid. Because the man loves rice (who doesn't?!), the man offers the king two different ways that he can be paid. *You can either pay me 100 kilograms of rice, or, you can pay me as follows: get a chessboard and put one grain of rice in the top left square. Then put 2 grains of rice on the next square, followed by 4 grains on the next, 8 grains on the next...and so on, where the amount of rice doubles on each square, until you get to the last square. When you are finished, give me all the grains of rice that would (in theory), fit on the chessboard.* The king, sensing that the man was an idiot for making such a stupid offer, immediately accepts the second option. He summons a chessboard, and begins counting out grains of rice one by one... Assuming that there are 64 squares on a chessboard, calculate how many grains of rice the main will receive. If one grain of rice weights 1/64000 kilograms, how many kilograms of rice did he get? *Hint: If you have trouble coming up with the answer, imagine how many grains are on the first, second, third and fourth squares, then try to create the vector that shows the number of grains on each square. Once you come up with that vector, you can easily calculate the final answer with the `sum()` function.*
```{r, collapse=TRUE}
# First, let's create a vector of the amount of rice on each square:
# It should be 1, 2, 4, 8, ...
rice <- 2 ^ (0:63)
# Here are the first few spaces
head(rice)
# The result is just the sum!
rice.total <- sum(rice)
rice.total
# How much does that weigh? Each grain weights 1/6400 kilograms:
rice.kg <- sum(rice) * 1/6400
rice.kg
# That's 2,900,000,000,000,000 kilograms of rice. Let's keep going....
# A kg of rice is 1,300 calories
rice.cal <- rice.kg * 1300
rice.cal
# How many people can that feed for a year?
# A person needs about 2,250 calories a day, or 2,250 * 365 per year
rice.people.year <- rice.cal / (2250 * 365)
rice.people.year
# So, that amount of rice could feed 4,600,000,000,000 for a year
# Assuming that the averge lifespan is 70 years, how many lifespans could this feed?
rice.people.life <- rice.people.year / 70
rice.people.life
# Ok...so it could feed 65,000,000,000 (65 billion) people over their lives
# Conclusion: King done screwed up.
```
## Chapter 7: Indexing vectors with []
```{r, echo = FALSE}
movie.data <- data.frame("movie" = c("Whatever Works", "It Follows", "Love and Mercy",
"The Goonies", "Jiro Dreams of Sushi",
"There Will be Blood", "Moon",
"Spice World", "Serenity", "Finding Vivian Maier"),
year = c(2009, 2015, 2015, 1985, 2012, 2007, 2009, 1988, 2005, 2014),
boxoffice = c(35, 15, 15, 62, 3, 10, 321, 79, 39, 1.5),
genre = c("Comedy", "Horror", "Drama", "Adventure", "Documentary",
"Drama", "Science Fiction", "Comedy", "Science Fiction",
"Documentary"),
time = c(92, 97, 120, 90, 81, 158, 97, -84, 119, 84),
rating = c("PG-13", "R", "R", "PG", "G", "R", "R",
"PG-13", "PG-13", "Unrated"))
knitr::kable(movie.data, caption = "Some of my favorite movies")
```
0. Create new data vectors for each column.
```{r}
movie <- c("Whatever Works", "It Follows", "Love and Mercy",
"The Goonies", "Jiro Dreams of Sushi",
"There Will be Blood", "Moon",
"Spice World", "Serenity", "Finding Vivian Maier")
year <- c(2009, 2015, 2015, 1985, 2012, 2007, 2009, 1988, 2005, 2014)
boxoffice <- c(35, 15, 15, 62, 3, 10, 321, 79, 39, 1.5)
genre <- c("Comedy", "Horror", "Drama", "Adventure", "Documentary",
"Drama", "Science Fiction", "Comedy", "Science Fiction",
"Documentary")
time <- c(92, 97, 120, 90, 81, 158, 97, -84, 119, 84)
rating <- c("PG-13", "R", "R", "PG", "G", "R", "R",
"PG-13", "PG-13", "Unrated")
```
1. What is the name of the 10th movie in the list?
```{r}
movie[10]
```
2. What are the genres of the first 4 movies?
```{r}
genre[1:4]
```
3. Some joker put Spice World in the movie names -- it should be ``The Naked Gun'' Please correct the name.
```{r}
movie[movie == "Spice World"] <- "The Naked Gun"
```
4. What were the names of the movies made before 1990?
```{r}
movie[year < 1990]
```
5. How many movies were Dramas? What percent of the 10 movies were Comedies?
```{r}
sum(genre == "Drama")
mean(genre == "Comedy")
```
6. One of the values in the `time` vector is invalid. Convert any invalid values in this vector to NA. Then, calculate the mean movie time
```{r}
time[time < 0] <- NA
mean(time, na.rm = TRUE)
```
7. What were the names of the Comedy movies? What were their boxoffice totals? (Two separate questions)
```{r}
movie[genre == "Comedy"]
boxoffice[genre == "Comedy"]
```
8. What were the names of the movies that made less than \$50 Million dollars AND were Comedies?
```{r}
movie[boxoffice < 50 & genre == "Comedy"]
```
9. What was the median boxoffice revenue of movies rated either G or PG?
```{r}
median(boxoffice[rating %in% c("G", "PG")])
# OR
median(boxoffice[rating == "G" | rating == "PG"])
```
10. What percent of the movies were either rated R OR were comedies?
```{r}
mean(rating == "R" | genre == "Comedy")
```
## Chapter 8: Matrices and Dataframes
The following table shows the results of a survey of 10 pirates. In addition to some basic demographic information, the survey asked each pirate "What is your favorite superhero?"" and "How many tattoos do you have?""
```{r echo = FALSE}
superhero <- data.frame(
Name = c("Astrid", "Lea", "Sarina", "Remon", "Letizia", "Babice", "Jonas", "Wendy", "Niveditha", "Gioia"),
Sex = c("F", "F", "F", "M", "F", "F", "M", "F", "F", "F"),
Age = c(30, 25, 25, 29, 22, 22, 35, 19, 32, 21),
Superhero = c("Batman", "Superman", "Batman", "Spiderman", "Batman",
"Antman", "Batman", "Superman", "Maggott", "Superman"),
Tattoos = c(11, 15, 12, 5, 65, 3, 9, 13, 900, 0)
)
knitr::kable(superhero)
```
1. Combine the data into a single dataframe. Complete all the following exercises from the dataframe!
```{r}
piratesurvey <- data.frame(
name = c("Astrid", "Lea", "Sarina", "Remon", "Letizia", "Babice", "Jonas", "Wendy", "Niveditha", "Gioia"),
sex = c("F", "F", "F", "M", "F", "F", "M", "F", "F", "F"),
age = c(30, 25, 25, 29, 22, 22, 35, 19, 32, 21),
superhero = c("Batman", "Superman", "Batman", "Spiderman", "Batman",
"Antman", "Batman", "Superman", "Maggott", "Superman"),
tattoos = c(11, 15, 12, 5, 65, 3, 9, 13, 900, 0),
stringsAsFactors = FALSE
)
```
2. What is the median age of the 10 pirates?
```{r}
median(piratesurvey$age)
```
3. What was the mean age of female and male pirates separately?
```{r}
mean(piratesurvey$age[piratesurvey$sex == "F"])
mean(piratesurvey$age[piratesurvey$sex == "M"])
## OR
with(piratesurvey,
mean(age[sex == "F"]))
with(piratesurvey,
mean(age[sex == "M"]))
## OR
mean(subset(piratesurvey,
subset = sex == "F")$age)
mean(subset(piratesurvey,
subset = sex == "M")$age)
```
4. What was the most number of tattoos owned by a male pirate?
```{r}
with(piratesurvey,
max(tattoos[sex == "M"]))
# OR
max(subset(piratesurvey,
subset = sex == "M")$tattoos)
```
5. What percent of pirates under the age of 25 were female?
```{r}
with(piratesurvey,
mean(sex[age < 25] == "F"))
```
6. What percent of female pirates are under the age of 25?
```{r}
with(piratesurvey,
mean(age[sex == "F"] < 25))
```
7. Add a new column to the dataframe called `tattoos.per.year` which shows how many tattoos each pirate has for each year in their life.
```{r}
piratesurvey$tattoos.per.year <- with(piratesurvey, tattoos / age)
```
8. Which pirate had the most number of tattoos per year?
```{r}
piratesurvey$name[piratesurvey$tattoos.per.year == max(piratesurvey$tattoos.per.year)]
```
9. What are the names of the female pirates whose favorite superhero is Superman?
```{r}
piratesurvey$name[with(piratesurvey, sex == "F" & superhero == "Superman")]
```
10. What was the median number of tattoos of pirates over the age of 20 whose favorite superhero is Spiderman?
```{r}
with(piratesurvey, (tattoos[age > 20 & superhero == "Spiderman"]))
```
## Chapter 13: Hypothesis tests
1. Do male pirates have significantly longer beards than female pirates? Test this by conducting the appropriate test on the relevant data in the pirates dataset.
```{r}
beard.sex.htest <- t.test(formula = beard.length ~ sex,
subset = sex %in% c("male", "female"),
data = pirates)
beard.sex.htest
apa(beard.sex.htest)
```
*Answer*: Yes, men have significantly longer beards than women, `r apa(beard.sex.htest)`
2. Are pirates whose favorite pixar movie is Up more or less likely to wear an eye patch than those whose favorite pixar movie is Inside Out? Test this by conducting the appropriate test on the relevant data in the pirates dataset.
```{r}
df <- subset(pirates, fav.pixar %in% c("Up", "Inside Out"))
pixar.ep.table <- table(df$fav.pixar, df$eyepatch)
pixar.ep.htest <- chisq.test(pixar.ep.table)
pixar.ep.htest
apa(pixar.ep.htest)
```
*Answer*: Yes, pirates whose favorite movie is Inside Out are much more likely to wear an eye patch than those whose favorite Pixar movie is Up, `r apa(pixar.ep.htest)`
3. Do longer movies have significantly higher budgets than shorter movies? Answer this question by conducting the appropriate test in the movies dataset.
```{r}
budget.time.htest <- cor.test(formula = ~ budget + time,
data = movies)
budget.time.htest
apa(budget.time.htest)
```
*Answer*: Yes, longer movies tend to have higher budgets than shorter movies, `r apa(budget.time.htest)`
4. Do R rated movies earn significantly more money than PG-13 movies? Test this by conducting a the appropriate test on the relevant data in the movies dataset.
```{r}
revenue.rating.htest <- t.test(formula = revenue.all ~ rating,
subset = rating %in% c("R", "PG-13"),
data = movies)
revenue.rating.htest
apa(revenue.rating.htest)
```
*Answer*: No, R Rated movies do not earn significantly more than PG-13 movies. In fact, PG-13 movies earn significantly more than R rated movies.
5. Are certain movie genres significantly more common than others in the movies dataset?
```{r}
genre.table <- table(movies$genre)
genre.htest <- chisq.test(genre.table)
genre.htest
apa(genre.htest)
```
*Answer*: Yes, some movie genres are more common than others, `r apa(genre.htest)`
6. Do sequels and non-sequels differ in their ratings?
```{r}
genre.sequel.table <- table(movies$genre, movies$sequel)
genre.sequel.htest <- chisq.test(genre.sequel.table)
apa(genre.sequel.htest)
```
*Answer*: Yes, sequels are more likely in some genres than others.
Note: The error "Warning in chisq.test" we get in this code is due to the fact that some cells have no entries. This can make the test statistic unreliable. You can correct it by adding a value of 20 to every element in the table as follows:
```{r}
genre.sequel.table <- table(movies$genre, movies$sequel)
# Add 20 to each cell to correct for empty cells
genre.sequel.table <- genre.sequel.table + 20
# Here is the result
genre.sequel.table
# Run a chi-square test on the table
genre.sequel.htest <- chisq.test(genre.sequel.table)
# Print the result
genre.sequel.htest
```
## Chapter 14: ANOVA
1. Is there a significant relationship between a pirate's favorite pixar movie and the number of tattoos (s)he has? Conduct an appropriate ANOVA with `fav.pixar` as the independent variable, and `tattoos` as the dependent variable. If there is a significant relationship, conduct a post-hoc test to determine which levels of the independent variable(s) differ.
```{r}
pixar.aov <- aov(formula = tattoos ~ fav.pixar,
data = pirates)
summary(pixar.aov)
```
*Answer: No, there is no significant effect*
2. Is there a significant relationship between a pirate's favorite pirate and how many tattoos (s)he has? Conduct an appropriate ANOVA with `favorite.pirate` as the independent variable, and `tattoos` as the dependent variable. If there is a significant relationship, conduct a post-hoc test to determine which levels of the independent variable(s) differ.
```{r}
favpirate.aov <- aov(formula = tattoos ~ favorite.pirate,
data = pirates)
summary(favpirate.aov)
```
*Answer: No, there is no significant effect*
3. Now, repeat your analysis from the previous two questions, but include both independent variables `fav.pixar` and `favorite.pirate` in the ANOVA. Do your conclusions differ when you include both variables?
```{r}
pirpix.aov <- aov(formula = tattoos ~ favorite.pirate + fav.pixar,
data = pirates)
summary(pirpix.aov)
```
4. Finally, test if there is an interaction between `fav.pixar` and `favorite.pirate` on number of tattoos.
```{r}
pirpix.int.aov <- aov(formula = tattoos ~ favorite.pirate * fav.pixar,
data = pirates)
summary(pirpix.int.aov)
```
*Answer: Nope still nothing*
## Chapter 15: Regression
The following questions apply to the auction dataset in the yarrr package. This dataset contains information about 1,000 ships sold at a pirate auction.
1. The column jbb is the "Jack's Blue Book" value of a ship. Create a regression object called `jbb.cannon.lm` predicting the JBB value of ships based on the number of cannons it has. Based on your result, how much value does each additional cannon bring to a ship?
```{r}
library(yarrr)
# jbb.cannon.lm model
# DV = jbb, IV = cannons
jbb.cannon.lm <- lm(formula = jbb ~ cannons,
data = auction)
# Print jbb.cannon.lm coefficients
summary(jbb.cannon.lm)$coefficients
```
2. Repeat your previous regression, but do two separate regressions: one on modern ships and one on classic ships. Is there relationship between cannons and JBB the same for both types of ships?
```{r}
# jbb.cannon.modern.lm model
# DV = jbb, IV = cannons. Only include modern ships
jbb.cannon.modern.lm <- lm(formula = jbb ~ cannons,
data = subset(auction, style == "modern"))
# jbb.cannon.classic.lm model
# DV = jbb, IV = cannons. Only include classic ships
jbb.cannon.classic.lm <- lm(formula = jbb ~ cannons,
data = subset(auction, style == "classic"))
# Print jbb.cannon.modern.lm coefficients
summary(jbb.cannon.modern.lm)$coefficients
# Print jbb.cannon.classic.lm coefficients
summary(jbb.cannon.classic.lm)$coefficients
```
3. Is there a significant interaction between a ship's style and its age on its JBB value? If so, how do you interpret the interaction?
```{r}
# int.lm model
# DV = jbb, IV = interaction between style and age
int.lm <- lm(formula = jbb ~ style * age,
data = auction
)
# Print int.lm coefficients
summary(int.lm)$coefficients
```
4. Create a regression object called \texttt{jbb.all.lm} predicting the JBB value of ships based on cannons, rooms, age, condition, color, and style. Which aspects of a ship significantly affect its JBB value?
```{r}
# jbb.all.lm model
# DV = jbb, IV = everything (except price)]
jbb.all.lm <- lm(jbb ~ cannons + rooms + age + condition + color + style,
data = auction
)
# Print jbb.all.lm coefficients
summary(jbb.all.lm)$coefficients
```
5. Create a regression object called \texttt{price.all.lm} predicting the actual selling value of ships based on cannons, rooms, age, condition, color, and style. Based on the results, does the JBB do a good job of capturing the effect of each variable on a ship's selling price?
```{r}
# price.all.lm model
# DV = price, IV = everything (except jbb)]
price.all.lm <- lm(price ~ cannons + rooms + age + condition + color + style,
data = auction
)
# Print price.all.lm coefficients
summary(price.all.lm)$coefficients
```
6. Repeat your previous regression analysis, but instead of using the price as the dependent variable, use the binary variable *price.gt.3500* indicating whether or not the ship had a selling price greater than 3500. Call the new regression object `price.all.blr`. Make sure to use the appropriate regression function!!.
```{r}
# Create new binary variable indicating whether
# a ship sold for more than 3500
auction$price.gt.3500 <- auction$price > 3500
# price.all.blr model
# DV = price.gt.3500, IV = everything (except jbb)
price.all.blr <- glm(price.gt.3500 ~ cannons + rooms + age + condition + color + style,
data = auction,
family = binomial # Logistic regression
)
# price.all.blr coefficients
summary(price.all.blr)$coefficients
```
7. Using `price.all.lm`, predict the selling price of the 3 new ships
```{r, echo = FALSE}
new <- data.frame(cannons = c(12, 8, 32),
rooms = c(34, 26, 65),
age = c(43, 54, 100),
condition = c(7, 3, 5),
color = c("black", "black", "red"),
style = c("classic", "modern", "modern"),
stringsAsFactors = F
)
library(knitr)
knitr::kable(new)
```
```{r}
# Create a dataframe with new ship data
new.ships <- data.frame(cannons = c(12, 8, 32),
rooms = c(34, 26, 65),
age = c(43, 54, 100),
condition = c(7, 3, 5),
color = c("black", "black", "red"),
style = c("classic", "modern", "modern"),
stringsAsFactors = FALSE)
# Predict new ship data based on price.all.lm model
predict(object = price.all.lm,
newdata = new.ships
)
```
8. Using `price.all.blr`, predict the probability that the three new ships will have a selling price greater than 3500.
```{r}
# Calculate logit of predictions
log.pred <- predict(object = price.all.blr,
newdata = new.ships
)
# Convert logits to probabilities
1 / (1 + exp(-log.pred))
```