-
-
Notifications
You must be signed in to change notification settings - Fork 3
/
Copy pathpart-01-02-the-basics.qmd
186 lines (132 loc) · 6.78 KB
/
part-01-02-the-basics.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
---
date-modified: 2024-05-27
---
# R Basics
R [@R] is a programming language oriented to statistical computing. R has become the *de facto* programming language in the social network community due to the large number of packages available for network analysis. R packages are collections of functions, data, and documentation that extend R. A good reference book for both novice and advanced users is ["The Art of R programming"](https://nostarch.com/artofr.htm) @Matloff2011[^matlof].
[^matlof]: [Here](http://heather.cs.ucdavis.edu/~matloff/145/PLN/RMaterials/NSPpart.pdf) a free pdf version distributed by the author.
## Getting R
You can get R from the Comprehensive R Archive Network website [CRAN] ([link](https://cran.r-project.org/){target="_blank"}). CRAN is a network of servers worldwide that store identical, up-to-date versions of code and documentation for R. CRAN website also has a lot of information about R, including manuals, FAQs, and mailing lists.
Although R comes with a Graphical User Interface [GUI], I recommend getting an alternative like [RStudio](https://posit.com){target="_blank"} or [VSCode](https://code.visualstudio.com/){target="_blank"}. RStudio and VSCode are excellent companions for programming in R. While RStudio is more common among R users, VSCode is a more general-purpose IDE that can be used for many other programming languages, including Python and C++.
## How to install packages
Nowadays, there are two ways of installing R packages (that I'm aware of), either using `install.packages`, which is a function shipped with R, or using the [`devtools`](https://cran.r-project.org/package=devtools) R package to install a package from some remote repository other than CRAN, here are a few examples:
```r
# This will install the igraph package from CRAN
> install.packages("netdiffuseR")
# This will install the bleeding-edge version from the project's GitHub repo!
> devtools::install_github("USCCANA/netdiffuseR")
```
The first one, using `install.packages`, installs the CRAN version of [`netdiffuseR`](https://r-project.org/package=netdiffuseR), whereas the line of code installs whatever version is published on https://github.com/USCCANA/netdiffuseR, which is usually called the development version.
In some cases, users may want/need to install packages from the command line as some packages need extra configuration to be installed. But we won't need to look at it now.
## A ~~gentle~~ Quick n' Dirty Introduction to R
Some common tasks in R
0. Getting help (and reading the manual) is *THE MOST IMPORTANT* thing you should know about. For example, if you want to read the manual (help file) of the `read.csv` function, you can type either of these:
```r
?read.csv
?"read.csv"
help(read.csv)
help("read.csv")
```
If you are not fully aware of what is the name of the function, you can always use the *fuzzy search*
```r
help.search("linear regression")
??"linear regression"
```
1. In R, you can create new objects by either using the assign operator (`<-`) or the equal sign `=`, for example, the following two are equivalent:
```r
a <- 1
a = 1
```
Historically, the assign operator is the most commonly used.
2. R has several types of objects. The most basic structures in R are `vectors`, `matrix`, `list`, `data.frame`. Here is an example of creating several of these (each line is enclosed with parenthesis so that R prints the resulting element):
```{r}
#| label: creating
(a_vector <- 1:9)
(another_vect <- c(1, 2, 3, 4, 5, 6, 7, 8, 9))
(a_string_vec <- c("I", "like", "netdiffuseR"))
(a_matrix <- matrix(a_vector, ncol = 3))
# Matrices can be of strings too
(a_string_mat <- matrix(letters[1:9], ncol=3))
# The `cbind` operator does "column bind"
(another_mat <- cbind(1:4, 11:14))
# The `rbind` operator does "row bind"
(another_mat2 <- rbind(1:4, 11:14))
(a_string_mat <- matrix(letters[1:9], ncol = 3))
(a_list <- list(a_vector, a_matrix))
# same but with names!
(another_list <- list(my_vec = a_vector, my_mat = a_matrix))
# Data frames can have multiple types of elements; it
# is a collection of lists
(a_data_frame <- data.frame(x = 1:10, y = letters[1:10]))
```
3. Depending on the type of object, we can access its components using indexing:
```{r indexing}
# First 3 elements
a_vector[1:3]
# Third element
a_string_vec[3]
# A sub matrix
a_matrix[1:2, 1:2]
# Third column
a_matrix[,3]
# Third row
a_matrix[3,]
# First 6 elements of the matrix. R stores matrices
# by column.
a_string_mat[1:6]
# These three are equivalent
another_list[[1]]
another_list$my_vec
another_list[["my_vec"]]
# Data frames are just like lists
a_data_frame[[1]]
a_data_frame[,1]
a_data_frame[["x"]]
a_data_frame$x
```
4. Control-flow statements
```{r control-flow}
# The oldfashion forloop
for (i in 1:10) {
print(paste("I'm step", i, "/", 10))
}
# A nice ifelse
for (i in 1:10) {
if (i %% 2) # Modulus operand
print(paste("I'm step", i, "/", 10, "(and I'm odd)"))
else
print(paste("I'm step", i, "/", 10, "(and I'm even)"))
}
# A while
i <- 10
while (i > 0) {
print(paste("I'm step", i, "/", 10))
i <- i - 1
}
```
5. R has a compelling set of pseudo-random number generation functions. In general, distribution functions have the following name structure:
a. Random Number Generation: `r[name-of-the-distribution]`, *e.g.*, `rnorm` for normal, `runif` for uniform.
b. Density function: `d[name-of-the-distribution]`, e.g. `dnorm` for normal, `dunif` for uniform.
c. Cumulative Distribution Function (CDF): `p[name-of-the-distribution]`, *e.g.*, `pnorm` for normal, `punif` for uniform.
d. Inverse (quantile) function: `q[name-of-the-distribution]`, e.g. `qnorm` for the normal, `qunif` for the uniform.
Here are some examples:
```{r random-numbers}
# To ensure reproducibility
set.seed(1231)
# 100,000 Unif(0,1) numbers
x <- runif(1e5)
hist(x)
# 100,000 N(0,1) numbers
x <- rnorm(1e5)
hist(x)
# 100,000 N(10,25) numbers
x <- rnorm(1e5, mean = 10, sd = 5)
hist(x)
# 100,000 Poisson(5) numbers
x <- rpois(1e5, lambda = 5)
hist(x)
# 100,000 rexp(5) numbers
x <- rexp(1e5, 5)
hist(x)
```
More distributions are available at `??Distributions`.
For a nice intro to R, take a look at ["The Art of R Programming" by Norman Matloff](https://nostarch.com/artofr.htm). For more advanced users, take a look at ["Advanced R" by Hadley Wickham](http://adv-r.had.co.nz/).