02_getting_started_with_r.qmd

# Getting Started with R

## The RStudio interface

The first screen we see in **R** should look like this:

[![The first screen we see when we open **R**Studio](./images/rstudio_first_screen.jpg)](./images/rstudio_first_screen.jpg)

Here, there are three main window panes. The "Console" window is where we type
the code to tell the computer to do stuff.

The "Environment-History-Connections" window has three tabs. "Environment" 
shows the objects that we save during our session (I will explain this 
soon); "History" shows a record of all the code we ask the computer to run; and
"Connections" (which we will not use here) shows us the connections we have to
remote databases.

The "Files-Plots-Packages" window has several tabs. "Files" shows us all the 
files in our current working directory, which is the default location where
**R** will look for files we want to load and where it will put any files we
save. "Plots" will display the plots we make in **R**. "Packages" shows us the
packages installed in our computer and whether they are loaded in our current
session. "Help" allows us to search and read the documentation for packages and 
functions. "Viewer" can display content that usually belongs to a web browser 
(i.e., html files).

We can tell **R** to do stuff by typing *commands* in the line that begins with
`>` in the  console window. This line is called "command prompt". Let's begin
with something simple:

```{r simplest operation}
1 + 1
```

The `[1]` that appears next to the result informs us that the line begins with
the first value in the result. This information is helpful when we run commands
that produce multiple values, like in this case:

```{r quick sequence of numbers}
80:110
```

If we type an incomplete command, **R** will assume that we are still writing
and will show a `+`. This `+` is *not* an arithmetical operator; it is a prompt
for us to continue writing in the next line. The prompt will not go away unless
we complete the command or we press the "Escape" key. For example:

```{r}
#| eval: false
> 42 -
+
+
+ 1
[1] 41
```

If **R** does not understand a command, it will display an error message.

```{r}
#| eval: false
> 7 % 2
Error: unexpected input in "7 % 2"
>
```

In this case, what I meant was 
```{r}
7 %% 2
```

## Using R as a calculator

**R** can perform basic arithmetic operations. Type the
following expressions at the command prompt (the line that begins with `>`) in
the **R** console window.

```{r simple arithmetic}
5 + 3
5 - 3
5*3
5/3
```

**R** can also do exponentiation with `^`^[`**` also works for exponentiation,
but it's better to stick to `^`.]; modulus, also known as remainder from
division, with `%%`; and integer division with `%/%`.

If we try to do shoddy math, **R** will inform us (but not necessarily with an
error message):

```{r shoddy math}
#| error: true
0/0
-1/0
```

**R** also has logical operators that will return "TRUE" or
"FALSE".

```{r logical operators}
5 == 6
5 != 6
5 < 6
```


The logical operators "and" (`&`) and "or" (`|`) allow us to combine multiple logical statements:

```{r conjunction and disjunction}
(5 < 6) & (7 == 8)
(5 < 6) | (7 == 8)
```

::: {.callout-tip}
## Practice

Think of an integer, double it, add six, divide it in half, subtract the number
you started with, and then square it. If your code is correct, the answer should
be nine.
:::

**Be careful of comparisons and floating point arithmetic**:

```{r floating point arithmetic}
(.1 + .2) == .3
(.1 + .2) - .3
```

<!--
The `all.equal` function will test if two values are "close enough":

`> all.equal(5, sqrt(5)^2)`

`> ?all.equal`
-->

This is not **R**'s fault. All computers have limited space to 
store numbers and decimal positions. So, computers often use numbers with 
"hidden" decimals even when it looks like they use round numbers. For example, 
what looks like a `3` may actually be `3.000005`. This is common enough to
justify the **R** [FAQ (7.31)](https://cran.r-project.org/doc/FAQ/R-FAQ.html#Why-doesn_0027t-R-think-these-numbers-are-equal_003f).

Being able to do these computations is nice, but what happens if we want to use
a result later without redoing the computation? **R** can save numbers (and
more) using something called "objects".

## Objects

In simple terms, an object is a named piece of information that we can access
whenever we want. A single number, a word, a plot, and a set of instructions can
all be objects if we name them. To create an object, we must first choose a
name, then type an arrow `<-` (a "lower than" symbol `<` followed by a "minus"
sign `-`), and then write the value that we want to save.

```{r create first object}
my_object <- 42
```

Another way of creating the object is to use the "equal to" sign `=` instead of
the arrow.

```{r create first object method 2}
my_object = 42
```

In this context, both `<-` and `=` are acting as "assignment operators" because
they assign the value `42` to the name `my_object`. In **R**, `=` has several
uses, so if you want to avoid confusion, I suggest you use `<-`. But you can
pick whichever symbol you like the most as long as you are consistent.

Now the environment window shows that there is an object called `my_object`,
which we can access at anytime by typing its name.

[![The environment pane with an object](./images/environment_with_object.jpg)](./images/environment_with_object.jpg)

```{r call my_object}
my_object
```

`my_object` has a number as its value, so we can do mathematical operations with
it.

```{r}
my_object + 7
my_object^my_object
```

We can also create objects based on other objects.

```{r}
a <- 10
b <- a
a
b
```

Be careful: if we try to reuse a name, **R** will overwrite the previous object
without notifying us in any way.

```{r overwriting object}
c <- 2
c
c <- 711
c
```

However, if an object is the copy of another object, overwriting the original
does not affect the copy; and overwriting the copy does not affect the
original.

```{r}
a <- 10
b <- a
b
a <- 5
b
```

We can name our objects in almost any way we want. The only rules are:

+ Names *can* be a combination of letters, digits, periods `.` and underscores
  `_`.
+ Names *cannot* include white spaces.
+ If a name starts with a period `.`, it *cannot* be followed by a digit.
+ Names *cannot* start with a number or an underscore `_`.
+ Names are case sensitive (for example, `age`, `Age` and `AGE` are three
different objects).
+ Reserved words (like `TRUE`, `FALSE`, `NULL`, `if`, ...) *cannot* be used as
names.

Also, here are a few suggestions that can save you many hours of frustration:

+ Avoid giving your object the same name as a built-in function.
+ If you need to create objects with multiple words in their name, separate them
with an underscore (`my_object`) or a dot (`my.object`), or capitalize the
different words (`MyObject`). Choose whichever format you like most and be
consistent with it.
+ Use informative names. It is quick and easy to use names like `x` or `value1`.
But your code will be easier and faster to understand if your objects have names
that illustrate their purposes. Your colleagues and your future self will really
appreciate it.

## Using functions

**R** has many built-in functions to do anything from basic math to text
manipulation to advanced statistical models. **R** thinks of functions as
objects that store commands that we can apply to an input. For example, we can
input a number to round it, or to calculate its square root, its factorial, or
its natural logarithm. To use a function, we must write the name of a function
followed by the input we want it to use inside a pair of round brackets.

```{r simple mathematical functions}
#| results: hide
round(2.1415)
sqrt(9)
factorial(7)
log(100)
```

The pieces of information inside the round brackets are called "arguments". Some
functions, like `sqrt()` and `factorial()`, only use one argument (the number
that we want to use). But other functions, like `round()` and `log()`, can take
more arguments (separated by commas) to modify their behavior. For example,
`round()` accepts a second argument that corresponds to the number of decimal
places that we want to keep.

```{r rounding to different decimal places}
round(2.1415, 0)
round(2.1415, 2)
```

The order of these arguments is important (what happens if we try to run
`round(2, 2.1415)`?) Fortunately, arguments have names that we can use to
specify which data to use in each case. So, we can write

```{r naming arguments}
round(x = 2.1415, digits = 3)
```

Even better, we can pass arguments in any order as long as we name them all.

```{r}
round(digits = 3, x = 2.1415)
```

Note that using the wrong name in a function will likely yield an error message.

```{r}
#| error: true
round(x = 2.1415, basket = 3)
```

A quick way to check what arguments we can use is to invoke `args()`, which
takes the name of a function as its argument:

```{r}
args(round)
```

The output shows that `round()` has two arguments: `x`, which is the number we
want to round; and `digits`, which determines how many decimal places to keep.
The output also shows that `digits` is already set to zero. This means that
`round` will use 0 as the default value for `digits`, i.e., it will not keep any
decimal places. We can omit arguments with default values and **R** will
automatically fill them for us in the background.

I encourage you to name all the arguments in the function, or at least the ones
after the first argument. It is hard to remember all the arguments that can go
into more complicated functions. And it is even harder to remember them in the
correct order. Accidentally switching unnamed arguments can produce wrong
results without us knowing. Naming the arguments will prevent errors and clarify
the function's behavior.

We can also run functions inside other functions. This works because **R** first
runs the innermost function and then uses the result in the outermost function.

```{r nested function}
sqrt(round(16.483, digits = 0))
```

The functions we have used are simple and intuitive to use. But sometimes we
need to know more about a function before we can use it. Where can we find
explanations to help us?

## Getting help

The `help()` function allows us to access **R**'s built-in help information on
any function. For example, to open the help page for `round()`, we do

```{r help function}
#| eval: false
help(round)
```

A shorter way of writing this is to use `?` before the name of the function.

```{r help shortcut}
#| eval: false
?round
```

After you run the code, the help page is displayed in the "Help" tab in the
"Files-Plots-Packages" pane (usually in the bottom right of **R**Studio).

As a novice user, help pages may seem arcane---perhaps because they aim for
shortness and use jargon. But this short jargon makes (most of) the explanations
precise, so we can use the information we need without having to read the entire
document. Also, all help pages are organized similarly, so we don't have to
relearn how to navigate them. With a bit of practice, you will be able to find
exactly what you need in mere seconds.

The first line of the help document displays the name of the function and the
package that contains the function. Other sections are:

+ **Description**: summarizes the function.
+ **Usage**: names the arguments associated with the function and possible
default values.
+ **Arguments**: describes the purpose and format of each input.
+ **Details**: explicates what the function does and how it does it.
+ **Value**: if applicable, describes the type and structure of the object that
the function returns.
+ **See Also**: suggests other help pages with similar or related content.
+ **Examples**: shows code to illustrate how to use the function. We can copy
and paste examples in the console, and we can access them at any time by using
the `example()` function (e.g., `example("round")`).

`help()` is useful if we know the full name of the function. But if all we remember is a key word in the name, we can search through **R**'s help system using `help.search()`

```{r}
#| eval: false
help.search("round")
```

Note that in this case we have to use quotes `" "` around the name of the function. We can also use the shortcut `??`

```{r}
#| eval: false
??round
```

As before, the 'Help' tab in **R**Studio will display the results of the search.
`help.search()` looks for the pattern in the help documentation, code
demonstrations, and package vignettes and displays the results as clickable
links that we can follow.

Another useful function is `apropos()`, which lists all functions containing a
specified character string. Note that `apropos()` returns names even when the
word we used is only part of a larger word, like in "surround". For example, to
find all functions with `round` in their name, we use^[This output should show 6
names instead of 3. I can not fix the output in the notes, but the code works
fine inside **R**Studio.]

```{r}
apropos("round")
```

Another useful function is `RSiteSearch()`, which allows us to search for
keywords and phrases in function help pages and vignettes for *all* CRAN
packages, and in CRAN task views. This way we can access the 
[online search engine](https://www.r-project.org/search.html) 
directly from the Console and display the results in our web browser.

```{r}
#| eval: false
RSiteSearch("regression")
```

:::callout-tip
## How to get started with help pages?
Start by reading the help pages of functions that you already understand. This
will teach you how to understand the structure of the pages and will familiarize
you with the jargon. As you use **R** you will likely need other, more
complicated functions, so reading more help pages will happen almost naturally.

Keep in mind that help pages explain the code, not the underlying concepts. If
you don't know what it means to round a number, reading the documentation for
`round()` will not help you.
:::

<!--
Now that we know how to get help, we can move on to more advanced stuff.
--> 

Until now, the console satisfied all our coding needs. But if we wanted to reuse
a command from before, we would have to either rewrite it from scratch, or look
tediously through our command history in the "Environment-History-Connections"
window. Besides, once we close this session in **R**Studio, all of our work will
be gone, like tears in the rain. It would be much better if we wrote our code in
a place where we can easily review and modify it.

## Working with scripts

A script is a plain text file where we write our code. With a script we can
edit, proofread, and reuse code; we can easily write multi-line code (which is
cumbersome to do in the console); and we can save our code so we can come back
to it later or share it with others. I think the best way to code in **R** is to
use a script, and I strongly suggest you always use one. 

To open a script, click on `File > New File > R script` in the menu bar on the upper left-hand side of the screen.

[![A screen with a new script](./images/new_script_screen.jpg)](./images/new_script_screen.jpg)

We can run a line of code in the script using the `Run` button in the upper
right-hand side of the script window (see picture below). **R** will run
whichever line of code your cursor is on. We can also run all of the code in the
script by clicking the `Source` button that's next to `Run`. I find that
clicking buttons is slow, so I use a keyboard shortcut for running a line of
code: `Control + Enter` in Windows and Linux; `Command + Enter` in Mac.

[![Buttons for running code](./images/run_code_buttons_edited.jpg)](./images/run_code_buttons_edited.jpg)

To save your script, you can click on the blue square^[Young reader, this square
is a [floppy disk](https://en.wikipedia.org/wiki/Floppy_disk); not-so-young 
reader, I feel your pain.] right below the tab with the script name. The first 
time you save your script, **R** will ask you to choose a name and a location 
to save your file in. I strongly suggest that you pick a name and a location 
that you will remember easily in the future.

Moving forward, I will write all my code in a script and will assume you are
doing the same. Trust me, it's for your own benefit.

Now that we know how to preserve our hard-gained code, we can do more
elaborate work. Previously we worked with one number at the time. But we can
also work with groups of numbers (and of other stuff) by using something called
"vectors".

## Working with vectors

In **R**, an ordered group of numbers is called a vector. To create a vector of
numbers, we need to use the function `c()` (short for "combine"). The arguments
of `c()` are the numbers you want to use in the vector, in the order you want to
use them.

```{r first vector}
my_vec <- c(5, 3, 7, 1, 1, 8) # This is a vector of numbers
my_vec
```

::: {.callout-note}
## Comments 
Did you see the text I added next to the code? **R** will ignore everything in a
line that comes after a hashtag `#`. This `#` is known as *commenting symbol*
because it allows us to comment our code. Comments can explain confusing chunks
of code, or warn about potential problems.
:::

Vectors can also contain other types of data, like words:

```{r vector of names}
vector_of_words <- c("monday", "lemon")
vector_of_words
```

Note that we must enclose words in quotation marks to let **R** know that we
want to use "monday" and "lemon" as values and not as names for objects. Look at
what happens if we forget the quotation marks:

```{r}
#| error: true
vector_of_words <- c(monday, lemon)
```

Later on we will work more with vectors of words, but for now let's focus on
vectors of numbers.

### Named vectors

We can name the elements of a vector if we add a name followed by an equal size
before the value that we want to save:
```{r named vector}
my_named_vec <- c(tigers = 2, lions = 5)
my_named_vec
```

Named vectors help us store more information in the same object. They also give
us another way of accessing the elements inside the vector.

### Operations with numerical vectors

**R** is a "vectorized" language, which means that it can often operate on an
entire vector of numbers as easily as on a single number. All the logical and
mathematical functions we used before work with vectors:

```{r simple math with vectors}
my_vec / 11
my_vec <= 7
log(my_vec)
my_vec*my_vec
```

In the last example, **R** did not obey the rules of linear algebra to multiply
two vectors. Instead, **R** used "element-wise execution", which means that
**R** applied the same operation to each element of the vector independently of
the other elements.

To decide how to apply element-wise execution, **R** considers the *length* of
the vectors, which refers to the number of elements inside them. When we use two
vectors with the same length, **R** will line up the vectors and perform a
sequence of individual operations. For instance, in `my_vec*my_vec`, **R**
multiplies the first element of vector 1 by the first element of vector 2, then
the second element of vector 1 by the second element of vector 2, and so on,
until all elements are multiplied. The result will be a new vector with the same
length as the first two. 

When we use two vectors with *different* lengths, **R** will repeat the shorter
vector until it has as many elements as the longer vector, and then it will do
the same as before. For example:

```{r recycling vectors 1}
my_vec * c(1, 2)
```

If the length of the short vector does not divide evenly into the length of the 
long vector, **R** will do an incomplete repeat of the shorter vector and return
a warning. 

```{r recycling vectors 2}
my_vec * c(1, 2, 3, 4)
```

Repeating the numbers of the vector is known as "vector recycling", and it helps **R** do element-wise operations.

Element-wise execution allows us to apply the same instruction to all elements 
in an object rather than one element at a time. When working with a data set, 
element-wise execution will ensure that values from one observation or case are 
only paired with values from the same observation or case. Element-wise 
execution also facilitates writing our functions in **R**.

**R** can do vector and matrix multiplications, but we have to ask for them 
explicitly. For example, to get the inner product, we need the operator `%*%`. 
And to get the outer product, we need `%o%`. Don't worry if you are not familiar
with matrix operations; you won't need them in these notes.

### Extracting elements

We can access specific elements of vectors using the square bracket `[ ]`
notation. To use it, we first write the name of the vector we want to extract
from, followed by the square brackets with an index of the element we wish to
extract. This index can be a position or a logical statement. To extract
elements based on their position we simply write the position number inside the
`[ ]`. For example, to extract the 3rd value of `my_vec`, we use

```{r extract value from my_vec}
my_vec[3]
```

We can store this value in another object.

```{r}
value_3 = my_vec[3]
```

We can also extract multiple elements at the time by using a vector of indices
inside the square brackets:

```{r}
my_vec[c(1, 3, 5)]
```

Or we can use `:` notation. Remember that `:` helps us create a sequence of
values. For example,

```{r}
3:6
```

So,

```{r}
my_vec[3:6]
```

::: {.callout-note}
## Note 
In **R**, the positional index starts at 1, so to call the first element of a
vector we need to use `[1]`. In many other programming languages (like Python
and C++), the positional index starts at 0.
:::

If the elements of the vector have names, we can extract a value using its name
(surrounded by quotes) instead of a positional index.
```{r extracting named element}
my_named_vec
my_named_vec["lions"]
```

Another convenient way to extract elements from a vector is to use a logical
expression as an index. For example, to extract all elements greater than 4 from
`my_vec`, we do

```{r}
my_vec[my_vec > 4]
```

This works because **R** uses element-wise execution even for logical
statements. So, `my_vec > 4` asks if each item of `my_vec` meets the condition
"greater than four" and returns the corresponding vector of `TRUE` and `FALSE`.
Then, when we add this result to the square brackets, **R** examines each
element of `my_vec` asking "should I extract this element?". If the answer is
`TRUE`, the value is extracted; if it's `FALSE`, the value is ignored. Under the
hood, using `my_vec > 4` is equivalent to

```{r}
my_vec[c(TRUE, FALSE, TRUE, FALSE, FALSE, TRUE)]
```

Two useful functions to extract values of a vector are `head()`, which extracts
the first few values of a vector; and `tail()`, which gives us the last few
values of a vector. Try them yourself!

### Replacing elements

We can replace the elements of a vector by combining the square bracket notation with the assignment operator. For example, to replace the second element of `my_vec`, we do

```{r replace element}
my_vec[2] <- 99
my_vec
```

To replace multiple elements with the same value, say, elements 5 and 6, we do

```{r}
my_vec[c(5, 6)] <- 55
my_vec
```

**R** can also do element-wise replacement:

```{r}
my_vec[c(5, 6)] <- c(100, 200)
my_vec
```

What happens if you try to replace two values with a vector that has three (or more) values?

Logical expressions help us replace values that meet specific conditions without having to find them ourselves.

```{r}
my_vec[my_vec > 44] <- -1
my_vec
```

### Reordering elements of vectors

To sort the elements of a vector from lowest to highest, we can use `sort()`

```{r}
sorted_vec <- sort(my_vec)
sorted_vec
```

If we want to sort from highest to lowest, we need to set the optional argument `decreasing` to `TRUE`.

```{r}
sorted_vec_decreasing <- sort(my_vec, decreasing = TRUE)
sorted_vec_decreasing
```

Another option is to first use `sort()` and then reverse the sorted vector using `rev()`.

```{r}
sorted_vec_decreasing <- rev(sort(my_vec))
sorted_vec_decreasing
```

A more useful feature of vectors is that we can reorder their elements based on the values of other vectors. To show this, let's first create a vector of city names and a vector with (my guess of) their typical daily temps in degrees Fahrenheit.

```{r}
cities <- c("Tokyo", "Cairo", "Mexico City", "Helsinki")
temps_fahr <- c(50, 90, 65, -10)
```

Now imagine that we want to order the vector of cities, going from coldest to hottest. The first step is to use `order()` to create a new variable called "temps_ordered".

```{r}
temps_ordered <- order(temps_fahr)
temps_ordered
```

This output says that the lowest value in `temps_fahr` is in the fourth position, the second lowest value is in the first position, and so on. We can think of `temps_ordered` as a vector of positional indices of temps in ascending order. Now we can use these indices to reorder the vector of cities.

```{r}
cities_ordered <- cities[temps_ordered]
cities_ordered
```

Ta-da! 

These vector manipulations can do more than dazzle your friends. Imagine that we
have a data set with two columns of data and that we want to sort it based on
the values of the first column. If we use `sort()` on each column separately,
the values of each column will become uncoupled from each other. Instead, we can
use `order()` on one column to make a vector of positional indices. Then we can
use these indices on both columns to keep the values of each coupled in the
original way.

So far we have relied on **R**'s built-in capabilities to do everything we need. But sometimes we need to do something for which there isn't a built-in function. In these cases, we can get inventive and write a function ourselves.

## Writing our own functions

Functions, as you may remember, are objects that store commands. The three basic parts of a function are name, code to implement, and arguments. To assemble these parts, we can use the `function()` function (yes, really) followed by a pair of curly brackets `{}`:

```{r}
my_function <- function() {}
```

`function()` will run the code that we write inside the curly brackets. This code is called the body of the function. Let's try something simple, like adding 1 + 1:

```{r make simple_function}
simple_function <- function() {
    1 + 1 # This function is very simple
}
```

:::callout-tip
## Indents make code more readable
Indenting the body of the function helps the reader notice that the code is only supposed to run inside the function. Indentation doesn't affect our functions, but it is helpful and predominant among **R** coders.
:::

To run our function, we have to write its name followed by round brackets, just like with any other function:

```{r run simple function}
simple_function()
```

Remember to write the round brackets even if they are empty. The round brackets make **R** *run* the code inside the function. If we don't write these brackets, **R** will *show us* the code inside the function (try it!).

Let's write a function to convert Fahrenheit degrees to Celsius. We want to use this function with different temperatures, so we need to include an argument that will tell **R** which temperature to convert each time.

```{r fahr to celsius without function}
fahr_to_cels <- function(temperature) {
    (temperature - 32) / 1.8
}
fahr_to_cels(27)
```

Now let's try something a bit more complicated: solving a quadratic equation. If we have an equation of the form $ax^2 + bx + c = 0$, then the solutions are given by $x = \frac{-b \pm \sqrt{b^2 - 4ac}}{2a}$. We can write a function that will apply this formula for us.

```{r solve_quadratic incomplete}
solve_quadratic = function(a, b, c) {
    # Quadratic equations can have two solutions
    solution_1 = (-b + sqrt(b**2 - 4*a*c)) / 2*a # First solution
    solution_2 = (-b - sqrt(b**2 - 4*a*c)) / 2*a # Second solution
}
```

Now all we need is to identify values of $a, b,$ and $c$ to pass as arguments.

```{r}
solve_quadratic(a = 1, b = -1, c = -3)
```

Why didn't our function show a result? When we run a function, **R** runs the
code in the body and returns the result of the last line of code. In
`solve_quadratic()`, the last line saves the second solution but doesn't show
it. We need more code to ensure that the last line displays both solutions.
Since we are using a script (right?), it is easy add one more line to our
function:

```{r solve_quadratic}
solve_quadratic = function(a, b, c) {
    # Quadratic equations can have two solutions
    solution_1 = (-b + sqrt(b**2 - 4*a*c)) / 2*a # First solution
    solution_2 = (-b - sqrt(b**2 - 4*a*c)) / 2*a # Second solution
    c(solution_1, solution_2) # Show vector of solutions
}
solve_quadratic(a = 1, b = -1, c = -3)
```

A more explicit way of ensuring our function will display its result is to use the `return()` statement inside the function:

```{r solve_quadratic with return()}
solve_quadratic = function(a, b, c) {
    # Quadratic equations can have two solutions
    solution_1 = (-b + sqrt(b**2 - 4*a*c)) / 2*a # First solution
    solution_2 = (-b - sqrt(b**2 - 4*a*c)) / 2*a # Second solution
    return(c(solution_1, solution_2)) # Return vector of solutions
}
solve_quadratic(a = 1, b = -1, c = -3)
```

`return()` can appear anywhere in the body and the function output will still be
whatever `return()` contains. Note that running `return(solution_1, solution_2)`
will produce an error because the result of a function must always be
a single object. This is why we combined both solutions from `solve_quadratic()`
into a single vector.

Our function can have as many arguments as we like. It is enough to add their
names, separated by commas, in the parentheses that follow the function. When
the function runs, **R** will replace each argument name in the function body
with the corresponding value that we supply. If we don't supply a value, **R**
will replace the argument name with the argument's default value (if we defined
one).

We can also use our functions to create other functions. Imagine that we want to multiply the solutions to our equation by an arbitrary value (maybe because we want to convert the units of $x$ to something else). And let's pretend that, by default, we want to double the solutions. We can write another function that first calls `solve_quadratic()` and then multiplies the result by our arbitrary number.

```{r}
multiply_solutions = function(a, b, c, multiplier = 2) {
    solve_quadratic(a, b, c) * multiplier
}
multiply_solutions(a = 1, b = -1, c = -3)
multiply_solutions(a = 1, b = -1, c = -3, multiplier = 10)
```

::: {.callout-note}
## Objects created in functions disappear
All of the objects that we create inside a function will disappear after it finishes running. Only the output will remain, and to save it we need to assign it to an object.
:::

Being able to write our own functions is great, but we don't need to reinvent the wheel every time we need to do something that is not available in **R**'s default version. We can easily download packages from CRAN's online repositories to get many useful functions.

## Acquiring external packages

To install a package from CRAN, we can use the `install.packages()` function. For example, if we wanted to install the package `readxl` (for loading .xslx files), we would need:
```{r installing readxl}
#| eval: false
install.packages("readxl", dependencies = TRUE)
```

The argument `dependencies` tells **R** whether it should also download other packages that `readxl` needs to work.

**R** may ask you to select a CRAN mirror which, simply put, refers to the location of the servers you want to download from. Choose a mirror close to where you are.

After installing a package, we need to load it into **R** before we can use its functions. To load the package `readxl`, we need to use the function `library()`, which will also load any other packages required to load `readxl` and may print additional information.
```{r loading readxl}
#| eval: false
library("readxl")
```

Whenever we start a new **R** session we need to load the packages we need. If we try to run a function without loading its package first, **R** will not be able to find it and will return an error message. 

Writing all our `library()` statements at the top of our **R** scripts is almost always good because it helps us know that we need to load the libraries at the start our sessions. It also helps others know quickly that they will need to have those libraries installed to be able to use our code.

A library can contain many objects, but sometimes we only need one or two of its functions. To avoid loading the entire library, we can access the specific function directly by specifying the package name followed by two colons and then the function name. For example:
```{r using specific function from library}
#| eval: false
readxl::read_xlsx("fake_data_file.xlsx")
```


## Exercise

Let's try to practice all of the basic features of **R** that you just learned. Write a function that can simulate the roll of a pair of six-sided dice (let's call them red and blue) an arbitrary number of times. This function should return a vector with the values of the red die that were strictly larger than the corresponding values of the blue die. Hint: to simulate rolling a die, you can use the function `sample()`.

::: {.callout-tip collapse="true"}
## Exercise step by step

+ Step 1: define a function that takes one argument, `num_rolls`, representing 
the number of times to roll the dice.
+ Step 2: create two objects called `red` and `blue` to store the results from 
the dice rolls.
+ Step 3: simulate the dice rolls using function `sample()` (read its help page 
if you need to).
+ Step 4: create a vector of indices that identifies the values in the red die
that were larger than the values in the blue die.
+ Step 5: use this vector of indices to extract the values from the red die.
+ Step 6: make sure that your function returns the values you extracted in step 5.

:::

## References

Most of this section is based on ["Hands-On Programming with R"](https://rstudio-education.github.io/hopr/), by Garret Grolemund; and on ["An Introduction to R"](https://intro2r.com/), by Alex Douglas, Deon Roos, Francesca Mancini, Ana Couto & David Lusseau.