This repository has been archived by the owner on Jan 30, 2024. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 3
/
Copy pathstack.Rmd
777 lines (479 loc) · 38.8 KB
/
stack.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
---
title: "Stack"
author:
name: "Maximilian Held"
affiliation: "Friedrich-Alexander Universität Erlangen-Nürnberg (FAU)"
date: "`r format(Sys.time(), '%d %B, %Y')`"
bibliography: library.bib
---
```{r setup, echo=FALSE}
knitr::opts_chunk$set(echo = FALSE)
```
Here's the software stack we'll be using.
You should work yourself through this stack in the order suggested.
This is by no means the *only* way to do open science or data science with open source software, and recommended packages are likely to change over time.
The below R-based toolchain should be considered as merely *one* (out of several) consistent implementations of some best practices.
However, once participants have mastered this toolchain, they should find it relatively easy to adapt to other ecosystems.
All of the software will be [free and open source software](https://en.wikipedia.org/wiki/Free_and_open-source_software), but we will also be using some proprietary Software-as-a-Service ([Saas](https://en.wikipedia.org/wiki/Software_as_a_service)) offerings.
For each of the proprietary services, there *are* open-source and/or self-hosted alternatives, but these are often much less convenient (e.g. [self-hosted Jenkins](https://jenkins.io) vs [GitHub Actions](https://github.com/features/actions)), or they are much less popular in the community, and therefore less useful (e.g. [GitLab](GitLab) vs [GitHub](https://github.com)).
Relying on, or pushing proprietary services, especially in an education context, is always awkward, but the disadvantages can sometimes be outweighted by convenience and network effect advantages.
For some aspects of open source software development and open science, proprietary services -- especially GitHub and the [StackExchange network](https://stackoverflow.com) -- for better or for worse just *are* the only game in town.
In any event, *most* of what students will learn in this class is in free and open source software, and the remaining proprietary usage should easily translate to other, competing or open services.
<div class="alert alert-info" role="alert">
The stack is listed below in the **approximate order** in which software is covered in this class, and together with **mandatory study material** and **tasks**.
The first couple of topics and tools should be covered by everyone.
Later, more advanced topics can be chosen by students according to their interest, as illustrated in the below flowchart.
```{r, fig.cap="Flow Chart of Covered Topics"}
DiagrammeR::mermaid(diagram = "topics.mmd")
```
</div>
## Introduction
For the introductory session, participants should watch this video:
<div class="embed-responsive embed-responsive-16by9">
<iframe width="560" height="315" src="https://www.youtube-nocookie.com/embed/cpbtcsGE0OA?rel=0" frameborder="0" allow="autoplay; encrypted-media" allowfullscreen></iframe>
</div>
Here's the corresponding [slide deck](https://speakerdeck.com/hadley/you-cant-do-data-science-in-a-gui).
### Installation {.alert .alert-info}
Participants should install all of the below software and sign up for all the below services **before the first class**.
If you are using Docker, *some* of the software is already contained in the image, though "local" installation may still be advisable.
The steps required for installation will depend on your platform and system setup.
### Basic Computing Literacy
<div class="alert alert-warning" role="alert">
... is expected.
To *start* this class you need to know your way around your own computer and commonly used technologies.
</div>
You should know, or easily find, the answer to questions such as:
- *In what directory (absolute path) are programs I use daily stored on computer?*
- *What is the operating system (OS) and version on my computer?*
- *In what directory (absolute path) do I store my files?*
- *Do I have sufficient privileges to install software?
If not, how can I get them?*
- *Which file format is better suited to editing and why: A `*.docx` or a `*.pdf`?*
- *Why do the search queries `jaguar car` and `jaguar -car` give different results on Google?*
- *Name at least 10 file types.*
- *What is the username I usually use on public-facing platforms?*
- *What is Two-Factor Authentification? (2FA)?*
- *How is my harddisc formatted?*
- *How can I upgrade my OS and frequently used software?*
- *How is the data on my computer protected from unauthorised access?*
- *What is my backup plan?*
- *What is a VPN client, and what do I need it for?*
If you feel like you need to brush up on some basic computing skills, these resources might be helpful:
- [Basic Computer Skills](https://edu.gcfglobal.org/en/basic-computer-skills/)
- WikiHow [Basic Computer Skills](https://www.wikihow.com/Category:Basic-Computer-Skills)
- Tech Republic [10 things...](https://www.techrepublic.com/blog/10-things/10-things-you-have-to-know-to-be-computer-literate/)
- Viking Code School [Terms to know](http://www.vikingcodeschool.com/web-development-basics/terms-to-know)
- [Free Linux tutorials](https://linuxjourney.com/)
- ... just google the answers to the above questions.
## Software Carpentry {#carpentry}
### Project Management {#pm}
<i class="fab fa-github fa-3x fa-pull-left"></i> <a class="btn btn-primary" href="https://github.com" role="button">Sign up to GitHub.com</a>
GitHub is a collaboration platform, code repository and git host (more on all of below) *along with some helpful project management tools*.
<div class="alert alert-info" role="alert">
Choose your username carefully: it should be easy to type, and clearly identify you.
If you already have such a public username on other platforms (say, twitter), consider reusing that.
Your username need not be your real name, but it makes things easier if it includes (parts of) your name.
</div>
Please flesh out your profile on GitHub.com and all of the accounts below a bit by adding a picture, a name and a short description of yourself.
<div class="alert alert-warning" role="alert">
Also be careful in choosing your [commit email address](https://help.github.com/articles/about-commit-email-addresses/).
This should be an email account you have access to forever.
</div>
<div class="alert alert-success" role="alert">
... and one last thing: By default, GitHub will notify you via E-Mail about pretty much every repository activity, which will result in a *lot* of email.
Here's how you can customize or disable [these notifications](https://help.github.com/articles/choosing-the-delivery-method-for-your-notifications/).
Make sure that you're not missing anything on GitHub.com either.
Here's a sensible set of defaults:
![Screenshot from [`github.com/settings/notifications`](https://github.com/settings/notifications)](img/notifications.png)
</div>
#### Resources
- [Getting Started with GitHub Video](https://www.youtube.com/watch?v=noZnOSpcjYY)
- [Mastering Issues](https://guides.github.com/features/issues/)
- [Advanced Formatting on GitHub](https://help.github.com/articles/working-with-advanced-formatting/)
- [Your profile](https://help.github.com/categories/setting-up-and-managing-your-github-profile/)
- [How to write good issues](https://wiredcraft.com/blog/how-we-write-our-github-issues)
#### Tasks
- Sign up to GitHub and say hi [in the welcome issue](https://github.com/soztag/fossos/issues/143)
### Community & Help {#help}
<i class="fab fa-stack-overflow fa-3x fa-pull-left"></i> <a class="btn btn-primary" href="https://stackoverflow.com" role="button">Sign up to StackOverflow</a>
<a class="btn btn-primary" href="http://community.rstudio.com" role="button">Sign up to RStudio community</a>
<div class="alert alert-info" role="alert">
This includes setting up a decent profile with picture, links etc.
Same username suggestions apply as above.
</div>
Aside from Google, these are two great places to get help, and to get involved in the community.
A lot of volunteers spend a lot of time on these sites, so it is very important not to waste their efforts, and to *only add quality content*, as defined by these sites.
#### Resources
- SE: [How to ask a good question](https://stackoverflow.com/help/how-to-ask)
- SE [How to create a Minimal, Complete, and Verifiable example](https://stackoverflow.com/help/mcve) (that's a **reprex**)
- [R Bloggers tips for good reprexes](https://www.r-bloggers.com/three-tips-for-posting-good-questions-to-r-help-and-stack-overflow/)
- wikiHow: [How to ask a question...](https://www.wikihow.com/Ask-a-Question-on-Stack-Overflow)
- SE: [Code of conduct](https://stackoverflow.com/help/behavior)
- SE: [How not to be a spammer](https://stackoverflow.com/help/promotion)
- SE: [Help re: asking](https://stackoverflow.com/help/asking)
- SE: [Help re: answering](https://stackoverflow.com/help/answering)
- Community: [Welcome](https://community.rstudio.com/t/welcome-to-the-rstudio-community/8)
- SE: [Making a great R reprex](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example)
- Tidyverse: [reprex package](https://reprex.tidyverse.org)
- code.likeagirl.io: [How to find a newcomer-friendly open source project](https://code.likeagirl.io/the-new-developers-guide-to-open-source-228ca257dd68)
#### Tasks
- Sign up to StackOverflow.
- Find an interesting question an StackOverflow and post it to the chat.
- Sign up to RStudio community and post your profile to the chat so we can follow each other.
- Find and interesting discussion on RStudio community and post it to the chat.
#### Additional Resources
In addition to StackExchange and RStudio Community, there are a couple of other platforms where the (very friendly) R community hangs out:
- [Twitter](https://twitter.com) `#rstats` hashtag.
Consider joining!
- Reddit subreddits:
- https://www.reddit.com/r/datascience/
- https://www.reddit.com/r/RStudio/
- https://www.reddit.com/r/Rlanguage/
- https://www.reddit.com/r/rstats/
- [Linear Digressions](http://lineardigressions.com) Podcast
- [R Weekly](https://rweekly.org) Blog
- [R Views](https://rviews.rstudio.com) Blog
### Markup Language {#markup}
<a class="btn btn-default" href="https://github.github.com/gfm/" role="button">GitHub Flavored Markdown Spec</a>
The full [GFM spec](https://github.github.com/gfm/) is just FYI; there's nothing to install here.
Plain text has many advantages (more on that later), but one glaring disadvantage:
it does not look very nice, and does not implement many of the typesetting conventions that have evolved since Gutenberg (say, **bold** face).
Markup syntaxes solve this problem.
Markup syntaxes are sets of conventions (as in `*something*` for highlighting) to structure human-generated *text* in a way that computers can operate on them, such as formatting a piece of text.
There are many, many, such markup languages out there, including **HTML** but also **Markdown** and **LaTeX**.
We will be focusing on Markdown as a source language, and then use open source tools (especially **Pandoc**), to render our source documents to all sorts of other formats, including PDF (via LaTeX), HTML (such as this website), but also Microsoft Word documents.
Markdown is a very lightweight markup language, that was designed to be maximally *human readable*, that is, looking meaningful *without* being compiled by a computer.
Most of the syntax takes its clues from how people have already formatted plain text, such as enclosing a `*word*` with `*` for *highlighting*.
Technically, Markdown is a convention for writing such files, as well as a program to convert such files into `HMTL`, as, for example, this website (which is written in a flavor of Markdown).
By convention, Markdown files use the `.md` file extension.
It's important to recognize that still, an `.md` is a *plain text file*.
You could open it with any text editor, or even change the extension to `*.txt` and nothing would change.
The extension `.md` serves merely to tell computers that the following plain text is marked up in markdown.
Markdown was (originally) quite a minimal standard, and has since branched out into a few specialised "flavors", offering additional features.
We will be using only *two* of these flavors: GitHub Flavored Markdown (GFM) and Pandoc's Markdown (more on that below).
StackOverflow, RStudio Community (Discourse), Gitter chat and many other services also support GFM.
GitHub, a leading code-hosting service, has extended the above original Markdown spec by a couple of additional features.
In addition to these formatting niceties, Github also implements some clever cross-referencing and autocompletion magic.
When using Github for source control and collaboration, you *really* must use these in issues, comments, commit messages etc. (they work *everywhere*).
#### Resources
- [Markdown Basics](https://help.github.com/articles/markdown-basics)
- [GFM Cheat Sheet](https://github.com/adam-p/markdown-here/wiki/Markdown-Cheatsheet)
- Original [Markdown Spec](https://daringfireball.net/projects/markdown/) by John Gruber, who invented the syntax.
- [Mastering Markdown](https://guides.github.com/features/mastering-markdown/)
- [Markdown Tutorial](https://www.markdowntutorial.com)
- [GFM](https://help.github.com/articles/github-flavored-markdown)
- [Writing on GitHub](https://help.github.com/articles/writing-on-github)
#### Additional Resources
If you like, you can also install a program on your computer to render Markdown to HTML.
There are [plenty](http://mashable.com/2013/06/24/markdown-tools/) of choices, including the free [MarkdownPad](http://markdownpad.com) for Windows, and [Lightpaper](http://clockworkengine.com/lightpaper-mac/) for OS X.
If you don't want to install something, [Github](https://github.com) (see below) also offers a Markdown preview in its browser-based editor.
We will be using different programs going forward.
### Shell {#cli}
A shell is a command-line interface (CLI) to your computer (as opposed to a point-and-click graphical user interface, or GUI).
You may also know this as "the console", or "the terminal".
There are technically different *kinds* of shells, though the `bash` shell is the most widespread, and is often used interchangeably with *the* shell.
A lot of programs that we'll be using only run at the CLI, so it's important to know how to use it.
You can use the (Linux) shell that ships with our Docker container, but you should also know your way around the shell that ships with your OS (to, among other things, spin up a Docker image).
On **macOS**, **Linux**: Nothing to install, ships with a bash shell or something similar.
On **Windows**:
- Install [Git for Windows](https://gitforwindows.org) because that comes with at least a git shell.
Choose **git bash emulation** on install.
- If your version is >= Windows 10 Anniversary Update you can also install [Install the Windows Subsystem for Linux (WSL)](https://docs.microsoft.com/en-us/windows/wsl/install-win10) and use the [Windows 10 Bash Shell](https://www.howtogeek.com/249966/how-to-install-and-use-the-linux-bash-shell-on-windows-10/).
However this is a *separate* system inside your Windows installation, and the programs installed inside it may (as of 2019-01) *not* be used "normal" windows GUI programs.
If you don't know what this means, do not install the WSL; it can be very confusing.
If you like a fancier shell, you might want to look at the [oh-my-zsh](https://ohmyz.sh) project, which has some pretty cool features.
However this is strictly optional, will not be supported in class.
#### Resources
- [Command Line for Beginners](https://learntocodewith.me/getting-started/topics/command-line/)
- Viking Code School [Command Line crash course](http://www.vikingcodeschool.com/web-development-basics/a-command-line-crash-course)
- Shell chapter form [Happywithgitr](http://happygitwithr.com/shell.html)
- If you're on windows: [how to access ubuntu bash files in windows](https://www.howtogeek.com/261383/how-to-access-your-ubuntu-bash-files-in-windows-and-your-windows-system-drive-in-bash/)
- Video: [Linux Terminal commands and navigation](https://www.youtube.com/watch?v=AO0jzD1hpXc)
#### Additional Resources
- Learn enough [Command line tutorial](https://www.learnenough.com/command-line-tutorial)
- Codecademy course [Learn the Command Line](https://www.codecademy.com/learn/learn-the-command-line) (offers a free tier)
- DataCamp course [Introduction to Shell for Data Science](https://www.datacamp.com/courses/introduction-to-shell-for-data-science)
- [The Unix Shell (Carpentries)](http://swcarpentry.github.io/shell-novice/)
### Bash (Optional) {#bash}
It turns out that `bash`, the default shell on UNIX-type computers is *also* a **scripting language** upon itself.
[Scripting languages](https://en.wikipedia.org/wiki/Scripting_language) are programming languages which facilitate automated execution of tasks, such as, say, running a bunch of updates and then power cycling your computer.
`bash` isn't necessarily the greatest scripting language; especially for more complicated projects, "proper" scripting languages such as Python, Ruby or R might serve you better.
But `bash` has the advantage that it is available in (almost all) UNIX-type computing environments, so it's often the easiest way to automate steps.
It looks a bit arcane (because it is), but you don't need much to build powerful scripts that can save you a lot of time.
This entire topic and the below *additional resources* are recommended for advanced readers.
You won't need this starting out.
#### Additional Resources
- [The Unix Workbench](https://seankross.com/the-unix-workbench/index.html)
- [Writing Shell Scripts - Beginners Guide](https://medium.com/tech-tajawal/writing-shell-scripts-the-beginners-guide-4778e2c4f609)
- [Bash Boilerplate](http://bash3boilerplate.sh)
- [Bash programming by (counter-)example](http://matt.might.net/articles/bash-by-example/)
- [Tips and Tricks for good Bash Scripts](https://codeburst.io/13-tips-tricks-for-writing-shell-scripts-with-awesome-ux-19a525ae05ae)
- [Bash Scripting Best Practices](https://sap1ens.com/blog/2017/07/01/bash-scripting-best-practices/)
- [Writing Readable Bash Scripts](https://mads-hartmann.com/2017/06/16/writing-readable-bash-scripts.html)
- [Google Shell Style Guide](https://google.github.io/styleguide/shell.xml)
### Containerisation (optional) {#docker}
<i class="fab fa-docker fa-3x fa-pull-left"></i><a class="btn btn-info" href="https://www.docker.com/products/docker-desktop" role="button">Install Docker Desktop</a>
Docker is an open-source industry standard to define, provision and share computing environments, known as *containers*.
Containers allow you to run computing environments on other computers.
Containers are similar to virtual machines (a computer inside a computer), but slimmer and generally neater.
You may want to use Docker in class to quickly get a development environment, but it is also generally a helpful tool.
#### Resources
- Install Docker Deskop for [Windows](https://docs.docker.com/docker-for-windows/install/) or [macOS](https://docs.docker.com/docker-for-mac/) or [Linux](https://docs.docker.com/install/linux/docker-ce/ubuntu/).
**If you don't meet the system requirements (especially on Windows), you can install the legacy [Docker Toolbox](https://docs.docker.com/toolbox/overview/) (not recommended) (also see [#144](https://github.com/soztag/fossos/issues/144))**.
- [Docker Overview](https://docs.docker.com/engine/docker-overview/)
- [Get Started](https://docs.docker.com/get-started/)
(You only need subsections 1 and 2)
- [Docker Introduction](https://www.freecodecamp.org/news/a-beginner-friendly-introduction-to-containers-vms-and-docker-79a9e3e119b/)
- [Another Docker introduction](https://medium.com/@kelvin_sp/docker-introduction-what-you-need-to-know-to-start-creating-containers-8ffaf064930a)
- [Rocker Project](https://rocker-project.org)
#### Tasks
- Install and run a docker image:
1. Download and install [Docker Desktop Community](https://www.docker.com/products/docker-desktop).
You need to set up a docker ID as part of the download, and you should remember your account credentials.
2. Launch Docker Desktop.
A small whale symbol will show up in your task bar, but not much else.
3. Launch a system shell and type in
```
docker run --env=PASSWORD=yourpassword --rm --publish=8787:8787 rocker/verse
```
On macOS and Linux, your default system shell is an application called "Terminal".
On Windows, you can use the "command prompt" or "PowerShell" applications.
If you already have git installed on Windows, you can also use the git bash emulator.
Depending on how fast your internet connection is, this process will take a while to complete.
4. Open a webbrowser and point it to `http://localhost:8787`.
You should see the login window for the RStudio IDE.
You are in the browser, but are using running all computations on *your* machine through Docker.
This will also work if you are offline.
5. Type in `rstudio` as a username and your `PASSWORD` given in the above as a password.
6. You are now in the RStudio IDE.
7. To shut down, close the browser window and type `ctrl + c` in your shell.
- Learn the difference between a `Dockerfile`, an image and a container in the context of Docker.
#### Additional Resources
- [Docker for R and Reproducibility](https://colinfay.me/docker-r-reproducibility/)
- [Dockerfile reference](https://docs.docker.com/engine/reference/builder/)
- [Environment Management with Docker](https://environments.rstudio.com/docker)
- [Docker Command Line](https://docs.docker.com/engine/reference/commandline/cli/)
- Using [bind mounts](https://docs.docker.com/storage/bind-mounts/) and [volumes](https://docs.docker.com/storage/volumes/).
### Source Control Management (SCM) {#scm}
<i class="fas fa-code-branch fa-3x fa-rotate-180 fa-flip-vertical fa-pull-left"></i><a class="btn btn-info" href="https://git-scm.com" role="button">Install Git</a>
Git is just a CLI program.
It offers all the functionality of git, but you may also install a Git graphical user interface (GUI).
There plenty of those out there, but one of the easiest is the GitHub Desktop app from GitHub (available only for Windows and macOS).
<i class="fab fa-github fa-3x fa-pull-left"></i> <a class="btn btn-info" href="https://desktop.github.com" role="button">Install GitHub Desktop</a>
**You should install Git and GitHub Desktop even if you are using Docker**.
You also need to *configure* git on your machine, so that git knows you are and to allow you to authenticate against git hosts (GitHub.com in our case) and wherever else you are using Git (such as a SaaS):
- [Your local machine](https://help.github.com/articles/setting-your-username-in-git/)
- [RStudio Cloud](https://bren.zendesk.com/hc/en-us/articles/360015826731-How-to-connect-RStudio-Cloud-with-Github)
- Configuring git inside Docker isn't recommended; it can get difficult you must be careful not to disclose your credentials.
#### Resources
- [Version Control with Git (Carpentries)](http://swcarpentry.github.io/git-novice/)
- [Try Git](https://try.github.io/)
- Jenny Bryan's [Happy Git With R](http://happygitwithr.com) (*yes, you should study this completely -- until Chapter 33*)
- [Referencing issues in commit messages](https://help.github.com/articles/closing-issues-via-commit-messages)
- [Why you should use version control](https://stackoverflow.com/questions/1408450/why-should-i-use-version-control)
- [Version control for the solo analyst](https://stackoverflow.com/questions/2712421/r-and-version-control-for-the-solo-data-analyst)
- [Git/Github Guide from Karl Broman](http://kbroman.org/github_tutorial/)
- [Setting up a gitignore](https://www.atlassian.com/git/tutorials/saving-changes/gitignore)
- Jenny Bryan's ["Excuse me ..." article](https://peerj.com/preprints/3159/)
#### Additional Resources
- [Free DataCamp Course](https://www.datacamp.com/courses/introduction-to-git-for-data-science)
### Branching Model (Optional) {#gitflow}
<a class="btn btn-default" href="https://guides.github.com/introduction/flow/" role="button">GitHub Flow</a>
There is a varied set of practices and tools that have evolved *on top of* Git.
Together with the powerful git scm, it is these practices and tools, that make massively collaborative software development possible.
One of the simpler practices is GitHub Flow.
We will use it to learn the branch and pull-request model.
#### Resources
- [How to fork](https://guides.github.com/activities/forking/)
- [Understanding the GitHub Flow](https://desktop.github.com)
- [Resolving a Merge Conflict](https://help.github.com/articles/resolving-a-merge-conflict-using-the-command-line/)
- [Comparing Commits accross time](https://help.github.com/articles/comparing-commits-across-time/)
### Package Management (optional) {#pkg-mngt}
<a class="btn btn-default" href="https://en.wikipedia.org/wiki/APT_(Debian)" role="button">Linux: already ships with `apt`</a>
<a class="btn btn-info" href="https://brew.sh" role="button">macOS: install home`brew`</a>
<a class="btn btn-info" href="https://chocolatey.org" role="button">Windows: install chocolatey</a>
Installing and upgrading a lot of command line tools and their dependencies gets old quickly.
Package managers solve this problem; they provide a clean and elegant way to install (CLI) programs, and even allow you to quickly upgrade everything.
You will need to install a package manager independent of using the Docker image.
<div class="alert alert-warning" role="alert">
Consider installing the below software via your package manager.
Whether this is advisable or even possible for any given piece of software depends on the software and your operating system.
Google around for advice.
As a rule of thumb, "heavy" packages (such as LaTeX, Atom or R) are sometimes best installed "by hand".
</div>
Notice that LaTeX, Atom and R (all below) each have their own *internal* package managers (as do many other software ecosystems).
If you're installing a package *for* either of those, use the corresponding ecosystem package manager, *not* your system-wide program (= `brew`, `apt-get`, `cholocatey`).
### Text Editor (Optional) {#editor}
<i class="fab fa-react fa-3x fa-pull-left"></i> <a class="btn btn-info" href="https://atom.io" role="button">Install Atom</a>
Whenever we write something in this class, it will be in [plain text](http://en.wikipedia.org/wiki/Plain_text).
Plain text, roughly speaking, consists *directly* and only of letters, encoded in an open standard.
This may seem antiquated, but has several advantages:
- Plain text can easily be versioned by computer software such as [git](http://git-scm.com).
- Plain text is transparent to the user: it is *human-readable*.
For comparison, try opening a `*.doc` in a text editor, and see whether you can make out any meaning.
- Plain text is lightweight and robust.
File sizes and memory footprint are tiny.
- Plain text files future-proof your work and data.
`*.txt`, or, equivalently for data, `*.csv` can be opened and edited on pretty much any computer today, could be 30 years go, today, and probably still will be widely accessible in 30 years time.
Most operating systems ship with a text editor, but they are quite basic and can be cumbersome to use.
Specialized text editors (or just editors) offer more functionality geared towards technical writing or software development.
There are many editors out there, and people have [strong views on which is best](https://en.wikipedia.org/wiki/Editor_war).
In some ways, this is surprising, because of all the software used in collaborative writing or development, *editors* are the tool that needs the least standardisation.
Playing off the advantages of plain text files, everyone can use what works best for them, because they all output the *exact* same thing: a `*.txt`.
You are therefore free to choose your own text editor.
You can use the RStudio IDE that comes with our Docker image, but other editors are more fully featured.
Atom has the advantage of being relatively easy to use, free and open source and relatively widely supported.
It also comes with some nice Git(Hub) integration.
Atom, as most editors, has a modular design.
Many of its features are factored out to separate packages, some of which are contributed by external volunteers.
Here's a list of packages you might also want to install:
- `atom-beautify`
- `atom-html-preview`
- `document-outline`
- `git-plus`
- `language-knitr`
- `language-latex`
- `latex`
- `language-markdown`
- `merge-conflicts`
- `minimap-split-diff`
#### Resources
- [Atom Getting Started Video](https://www.youtube.com/watch?v=U5POoGSrtGg)
- [Atom Forums](https://discuss.atom.io/t/welcome-to-discuss-atom-io/4)
- [Atom Flight Manual](https://flight-manual.atom.io/getting-started/sections/why-atom/) (only Chapters 1-2!)
### R Integrated Development Environment (IDE) {#ide}
<a class="btn btn-info" href="https://www.rstudio.com/products/rstudio/download/#download" role="button">Install RStudio</a>
<div class="alert alert-warning" role="alert">
*Before* installing RStudio, you must first install R (see below).
We may, for now, not use R much by itself, but it can be easier to use all the other tools inside of RStudio, rather than separately.
</div>
If you are using the Docker container image, RStudio and R are already included.
Aside from *text editors*, there are also *integrated development environments* (IDE) (though this distinction has recently been blurring with the arrival of [Atom-IDE](https://ide.atom.io) and others).
IDEs are a little like text editors, in that they mostly let you edit plain text files, but they offer a lot of "training wheels" for programming and are often geared towards particular programming languages.
The leading IDE for R is called RStudio, by, confusingly, a company called RStudio.
We will be using the open source variant of RStudio (the IDE), but RStudio (the company) also sells commercial licenses to the IDE and other products.
If you are already deeply invested in an IDE or Editor (especially [vim](https://www.vim.org) or [emacs](https://www.gnu.org/software/emacs/)) you may also trick out that program to support R.
The [Emacs speaks statistics](https://ess.r-project.org) project has great support for R, but Emacs has a steep learning curve.
For most everyone, RStudio will therefore be the strongly recommended choice.
#### Resources
- RStudio [Cheat Sheet](https://www.rstudio.com/resources/cheatsheets/#ide)
- Using [Projects in RStudio](https://r4ds.had.co.nz/workflow-projects.html)
- [Connecting RStudio and Git](http://happygitwithr.com/rstudio-git-github.html)
- [RStudio Projects documentation](https://support.rstudio.com/hc/en-us/articles/200526207-Using-Projects)
- [Good enough practices](https://swcarpentry.github.io/good-enough-practices-in-scientific-computing/)
- [The panes in RStudio](https://campus.datacamp.com/courses/working-with-the-rstudio-ide-part-1/orientation?ex=5)
### Technical & Scientific Authoring (optional) {#sciwri}
**All software in this section is included in the Docker image.**
#### Document Conversion (optional) {#conversion}
<i class="far fa-file-alt fa-3x fa-pull-left"></i> <a class="btn btn-info" href="https://pandoc.org" role="button">Install Pandoc</a>
We'll often want to convert documents from and to different markup formats.
For that purpose, we'll use pandoc.
[Pandoc](https://pandoc.org) is, originally, a kind of swiss army knife for text document formats, such as, say, between Microsoft Word and HTML.
But as part of this work, Pandoc has *also* defined its own extension (flavor) to Markdown (largely compatible with GFM), including such features as footnotes, captions, references, and other aspects important for technical and scientific writing.
You should *both* learn to *use* Pandoc at the CLI as well as to *write* in the corresponding Pandoc's Markdown style.
##### Resources
- [Pandoc User's Guide](https://pandoc.org/MANUAL.html)
- *especially* [Pandoc's Markdown](https://pandoc.org/MANUAL.html#pandocs-markdown)
#### Typesetting (optional) {#typesetting}
<i class="far fa-file-alt fa-3x fa-pull-left"></i> <a class="btn btn-info" href="https://www.latex-project.org" role="button">Install LaTeX</a>
[(La)TeX](http://www.latex-project.org) is strictly speaking a *typesetting* program, which can create beautiful documents.
It has extensive support for all sorts of domain-specific typographic niceties, and is used a lot by academics, especially in math and sciences because.
However, because LaTeX is quite cumbersome to compose and tends to distract writing with a lot of bells and whistles, we will *not* learn to write LaTeX directly "by hand".
Instead, we will be using Pandoc to compile our Pandoc Markdown source *to* PDF (via LaTeX), and, because LaTeX can be slow to compile, we will only do so rarely and towards the end of any given project.
Still, it is important to learn some of the basics of LaTeX to use it programmatically.
##### Resources:
- Overleaf [Learn LaTeX in 30 Minutes](https://www.overleaf.com/learn/latex/Learn_LaTeX_in_30_minutes)
- [LaTeX cheat sheet](http://users.dickinson.edu/~richesod/latex/latexcheatsheet.pdf)
### Bibliography Management (optional) {#bib}
<a class="btn btn-info" href="https://github.com/jgm/pandoc-citeproc" role="button">Install Pandoc Citeproc</a>
<a class="btn btn-default" href="biblio.html" role="button">Install Zotero</a>
Bibliography management is *not* the focus of this class, but you can learn more about it [here](biblio.html).
It is also one of those tools, where there is no strong reason to standardize on any one program, so as long as the bibliography manager exports to one of the formats [that pandoc can ingest](https://pandoc.org/MANUAL.html#citations).
Check if your bibliography manager can export to at least one of these formats.
If you have a choice, a BibTeX or BibLaTeX file (confusingly both named `*.bib`) are preferable.
#### Resources:
- [Pandoc's Citeproc documentation](https://pandoc.org/MANUAL.html#citations)
## Introductory R
**All software in this section is included in the Docker image.**
### "Base" R {#base-r}
<a class="btn btn-info" href="https://www.r-project.org" role="button">Install R</a>
#### Resources
- Work through the RStudio Cloud [primers](https://rstudio.cloud/learn/primers)
- DataCamp [Introduction to R](https://www.datacamp.com/courses/free-introduction-to-r) (free tier)
- [R Cookbook](http://www.cookbook-r.com)
- Codecademy [Learn R: Introduction](https://www.codecademy.com/learn/learn-r/modules/learn-r-introduction) (7 days trial)
#### Additional Resources
- Check out swirl for [learning R, in R](https://swirlstats.com).
- Statistics in R [Coursera course](https://www.coursera.org/specializations/statistics)
- [Data Analysis with R](https://eu.udacity.com/course/data-analysis-with-r--ud651)
### Literate Programming {#literate}
<a class="btn btn-info" href="https://yihui.name/knitr/" role="button">Install knitr</a>
<a class="btn btn-info" href="https://rmarkdown.rstudio.com" role="button">Install RMarkdown</a>
#### Resources
- Yihui Xie [knitr website](https://yihui.name/knitr/)
- Introduction to [R Markdown](https://rmarkdown.rstudio.com/lesson-1.html)
- RStudio [R Markdown](https://rmarkdown.rstudio.com)
- Yihui Xie et al.: [The definitive RMarkdown guide](https://bookdown.org/yihui/rmarkdown/)
- [R Markdown Cheat sheet](https://www.rstudio.com/wp-content/uploads/2015/03/rmarkdown-reference.pdf) and [this](https://rmarkdown.rstudio.com/lesson-15.html)
## Intermediate R {#intermediate-r}
**All software in this section is included in the Docker image.**
### Tidyverse R
<a class="btn btn-info" href="https://www.tidyverse.org" role="button">Install dplyr, tidyr, readr, tibble, purrr and stringr</a>
#### Resources
- [R for Reproducible Scientific Analysis (Carpentries)](http://swcarpentry.github.io/r-novice-gapminder/)
- [R for Reproducible Science (Merely Useful)](https://merely-useful.github.io/r-rse/)
- [Novice R (Merely Useful)](https://merely-useful.github.io/r/)
- Documentation for the above packages.
- List of [resources here](https://www.tidyverse.org/learn/).
- Hadley Wickham and Garrett Grolemund's [R for Data Science](https://r4ds.had.co.nz) (read only the chapters that you are interested in).
### Plots
<a class="btn btn-info" href="https://ggplot2.tidyverse.org" role="button">Install ggplot2</a>
#### Resources
- Documentation for [ggplot2](https://ggplot2.tidyverse.org).
## Interactive R {#interactive-r}
**All software in this section is included in the Docker image.**
### HTML, JS & CSS (*optional*)
The below packages for (web) interactivity in R try to abstract away as much as possible the underlying web technologies (HTML, JavaScript and CSS).
You can use them without knowing anything about this stack, but you can accomplish more and understand them in a deeper way if you have at least a cursory understanding of how these technologies work.
Covering them in any depth, or even listing good resources (of which there are gazillions) is beyond the scope of this class, so these should be considered mere starting points.
- very [gentle introduction](https://internetingishard.com/html-and-css/introduction) by some webpage
- [w3schools tutorials](https://www.w3schools.com): doesn't look like much, but covers it pretty much all.
- free [udacity intro to html and css](https://in.udacity.com/course/intro-to-html-and-css--ud304).
- [list of links from quora](https://www.quora.com/What-is-the-best-way-to-learn-HTML-CSS-and-JavaScript)
### Interactive Graphics
<a class="btn btn-info" href="https://plot.ly/r/" role="button">Install plotly for R</a>
<a class="btn btn-info" href="http://www.htmlwidgets.org" role="button">Install htmlwidgets</a>
<a class="btn btn-info" href="https://rmarkdown.rstudio.com/flexdashboard/" role="button">Install flexdashboard</a>
#### Resources
- [Plotly documentation](https://plot.ly/r/)
- [HTML Widgets documentation](http://www.htmlwidgets.org)
- [flexdashboard documentation](https://rmarkdown.rstudio.com/flexdashboard/)
#### Additional Resources
- [Crosstalk documentation](http://rstudio.github.io/crosstalk/)
### Interactive Webapps
<a class="btn btn-info" href="https://shiny.rstudio.com" role="button">Install shiny</a>
#### Resources
- [Free DataCamp course](https://www.datacamp.com/?tap_a=5644-dce66f&tap_s=213387-418a02)
- Any of the materials featured on [shiny.rstudio.com](https://shiny.rstudio.com) -- whatever works best for you.
## Advanced R {#adv-r}
- [Programming with R (Carpentries)](http://swcarpentry.github.io/r-novice-inflammation/)
- Hadley Wickham's [Advanced R](https://adv-r.hadley.nz)
## Cloud Computing with R {#cloud}
### Continuous Integration & Development (CI/CD)
<a class="btn btn-info" href="https://travis-ci.com" role="button">Sign up to Travis CI</a>
#### Resources
- Julia Silge's [Beginners Guide](https://juliasilge.com/blog/beginners-guide-to-travis/)
- Travis CI: [Building an R Project](https://docs.travis-ci.com/user/languages/r/)
### The Cloudyr Project
tba.
## Reproducible Research
### Defensive Computing
tba.
### Storing Datasets
tba.
### Publishing results
tba.
### Dependency Management
<a class="btn btn-info" href="https://rstudio.github.io/packrat/" role="button">Install packrat</a>
## Package development
- Really "just" Hadley Wickham's [R Packages](http://r-pkgs.had.co.nz)