-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathhandouts_r-bootcamp.qmd
More file actions
184 lines (110 loc) · 5.6 KB
/
handouts_r-bootcamp.qmd
File metadata and controls
184 lines (110 loc) · 5.6 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
# R bootcamp
(A few remarks and tips before the practical session)
# R is the best technology for doing computational science
## R has an incredible wealth of toolkits
The most famous is the [_tidyverse_](http://tidyverse.org) ecosystem for data science:
<center>{width="30%"}</center>
There are packages for machine learning
([Keras](https://tensorflow.rstudio.com/keras/),
[Tensorflow](http://tensorflow.rstudio.com/)),
spatial packages ([_sf_](https://r-spatial.github.io/sf/),
[_stars_](https://r-spatial.github.io/stars/)), packages
specific to research fields ([genomics](https://cran.r-project.org/web/views/Omics.html),
[ecology](https://cran.r-project.org/web/views/Environmetrics.html), etc.).
More than [23000](https://cran.r-project.org/web/packages/) packages total.
## R has awesome easy-to-use(!) tools for reproducibility
- [**Quarto** "authoring system"](https://quarto.org/) for writing automated
reports, slides, PDF documents, etc. (our "Topic #4!")
- [_targets_](https://docs.ropensci.org/targets/) pipelining framework (possibly
the most powerful and flexible of its kind)
- [_tidyverse_](http://tidyverse.org) framework (particularly the
_dplyr_ R package introduced as "Topic 2/3") is designed to
facilitate building readable, easy-to-write processing pipelines
- R itself is a very powerful, flexible programming language
## The unfortunate way R is taught...
- Some slides on _"R as a calculator"_ (only half joking)
- Then straight into plotting histograms and computing t-tests
- Effectively treats computation / data science as black box
. . .
- R was first created "by statisticians for statisticians" (1991)
- So this way of teaching R makes sense historically
. . .
- But teaching needs change in modern times:
- Our data is larger and more complex than in 1990s
- Reproducibility requires proper programming skills
## Challenge of teaching programming
1. Programming is a skill, not a knowledge to transfer
2. Teaching R in a lecture format would mean 3 hours of torture
. . .
<center>**Today's "R bootcamp" session is designed to walk you through
fundamentals of R in an interactive form.**</center>
. . .
<br>A series of problems-solutions to develop understanding of:
- What happens behind the scenes of data-science operations.
- Which will give you tools and confidence to build "mental models"
# Still, a couple of practical tips
<br>
<center>(Having observed how many scientists use R in practice.)</center>
## Knowing RStudio well is like having a superpower
Don't take it as nothing but a text editor like `Notepad`.
It's a [starship Enterprise](https://laughingsquid.com/wp-content/uploads/2023/05/All-Enterprise-Bridges.jpg)
of data science at your fingertips. It's incredible powerful and has a lot of
features.
. . .
<br>[This cheatsheet](cheatsheets/rstudio.pdf) has a lot of information,
but try to internalize <span style="background-color: #FFFF00">
keyboard shortcuts which I
highlighted in yellow</span> in the PDF.
<br>At first it will be annoying and slower to use keyboard and not a mouse, but trust
me. It will pay of in the long run.
## Read-Eval-Print Loop (REPL)
> [...] the user enters expressions (rather than an entire [computer program]),
the REPL evaluates them and displays the results [...] -- Wikipedia
An idea from ancient computers (1964!) with these functions:
1. **read** --- accepts a bit of code from a user (`1 + 2`)
2. **eval** --- evaluates the code (applies `+` on `1` and `2`, yielding `3`)
3. **print** --- prints the result `6` on the screen
Steps 1.-3. repeat in an infinite **loop**, until the program closes.
. . .
<center><h3>R console is a powerful REPL!<h3></center>
## R console is like an ultimate experimental lab equipment
R encourages a highly interactive workflow.
When I don't understand something, some code I don't
get, etc., I always type it in the REPL to build an intuition.
. . .
Doing data analysis is like playing a detective, especially when figuring
out bugs and problems.
. . .
Form a hypothesis, run a tiny bit of R code to test the hypothesis.
Move forward based on the result you got.
. . .
**I see a lot of experienced PhD students writing and running long code
top-to-bottom, instead of thinking methodically.**
## Built-in R help always has an answer!
All languages (and their packages) have documentation, sure.
But it's mostly scattered on the internet, often hard to find.
. . .
**R packages have a standardized documentation inside R!**
- Every `func` has a manual page available at command `?func`
. . .
**Every single such help page describes:**
1. Basic usage of the function
2. Which optional parameters can be given
3. Description of what the function does
4. **Runnable example code (!!!)**
## These manuals are amazingly helpful
<center>{width="50%"}</center>
<center><small>(Help for a function `ts_tajima()` from my R package.)</small></center>
## Consider switching the pane layout
In the RStudio menu `Global Options` `->` `Pane Layout` set:
<center>{width="40%"}</center>
Maximum vertical space for code and easy switching between script
and R console (particularly with keyboard shortcuts).
# Let's get started!
1. Go to [www.bodkan.net/simgen](https://bodkan.net/simgen)
2. Click on _"R bootcamp"_ in the left panel---these are the materials
for this session (exercises, solutions, explanations)
3. _"Cheatsheets and handouts"_ section in the left panel contains
a single-page version of these slides and RStudio and base R cheatsheets
for your reference
4. Open your RStudio and start working!