simgen/popgen-selection.qmd at main · bodkan/simgen · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
---
#execute:
#  freeze: auto  # re-render only when source changes
eval: false
---

# Natural selection

---

**Note:** Unfortunately, the most recent SLiM v4.3 is [a little broken](https://github.com/MesserLab/SLiM/issues/495),
largely sabotaging the second half of this exercise in which _slendr_ uses SLiM
for non-neutral simulations. That said, the first half of the exercise works
perfectly fine and, once SLiM v5.0 comes out (soon), everything from the first
half will apply to the selection simulations. Alternatively, if you have the
option, use SLiM 4.2.x.

---

The primary motivation for designing _slendr_ was to make demographic modelling
in R as trivially easy and fast as possible, focusing exclusively on neutral
models. However, as _slendr_ became popular, people have been asking for
the possibility of simulating natural selection. After all, a large
part of _slendr_'s functionality deals with population genetic [models across
geographical landscapes](https://www.slendr.net/articles/vignette-06-locations.html),
which requires SLiM. So why not support selection simulations using _slendr_
as well?

In December 2024 I caved in and added support for modifying
_slendr_ demographic models with bits of SLiM code, which allows simulating
pretty much any arbitrary selection scenario you might be interested in.

This exercise is a quick demonstration of how this works and how you might
simulate selection using _slendr_. We will do this using another toy
model of ancient human history, which we will first use as a basis for simulating
the frequency trajectory of an allele under positive selection, and then implementing a toy selection scan using Tajima's D.

To speed things up, **create a new `selection.R` script and copy the following
code as a starting point for this exercise**:

```{r}
#| collapse: true
library(slendr)
init_env(quiet = TRUE)

# This line sources a script in which I provide a few useful helper functions
# which you can use in this exercise
source(here::here("files/popgen/popgen_utils.R"))

# African ancestral population
afr <- population("AFR", time = 65000, N = 5000)

# First migrants out of Africa
ooa <- population("OOA", parent = afr, time = 60000, N = 5000, remove = 27000) %>%
  resize(N = 2000, time = 40000, how = "step")

# Eastern hunter-gatherers
ehg <- population("EHG", parent = ooa, time = 28000, N = 5000, remove = 6000)

# European population
eur <- population("EUR", parent = ehg, time = 25000, N = 5000) %>%
  resize(N = 10000, how = "exponential", time = 5000, end = 0)

# Anatolian farmers
ana <- population("ANA", time = 28000, N = 5000, parent = ooa, remove = 4000)

# Yamnaya steppe population
yam <- population("YAM", time = 8000, N = 5000, parent = ehg, remove = 2500)

# Define gene-flow events
gf <- list(
  gene_flow(from = ana, to = yam, rate = 0.75, start = 7500, end = 6000),
  gene_flow(from = ana, to = eur, rate = 0.5, start = 6000, end = 5000),
  gene_flow(from = yam, to = eur, rate = 0.6, start = 4000, end = 3500)
)

# Compile all populations into a single slendr model object
model <- compile_model(
  populations = list(afr, ooa, ehg, eur, ana, yam),
  gene_flow = gf, generation_time = 30
)

# Schedule the sampling from four European populations roughly before their
# disappearance (or before the end of the simulation)
schedule <- rbind(
  schedule_sampling(model, times = 0, list(eur, 50)),
  schedule_sampling(model, times = 6000, list(ehg, 50)),
  schedule_sampling(model, times = 4000, list(ana, 50)),
  schedule_sampling(model, times = 2500, list(yam, 50))
)
```

**Next, visualize the demographic model.** If you did a bit of work in human
population genetics, you might recognize it as a very simplified model
of demographic history of Europe over the past 50 thousand years or so.
As you can see, we are recording 50 individuals from four populations -- for
Europeans we sample 50 individuals at "present-day", for the remaining populations
we're recording 50 individuals just before their disappearance. Also note that
there's quite a bit of gene-flow! This was an important thing we've learned about
human history in the past 10 years or so -- everyone is mixed with pretty much
everyone, there isn't (and never was) anything as a "pure population".

::: {.aside}
**Note:** We didn't discuss it earlier, but _slendr_ also provides the option to
specify a `remove =` argument in a `population()` call which instructs the
simulation engine to delete a population from a simulation at a given point.
For our `msprime()` simulations in earlier examples it wasn't really important,
but for the `slim()` simulation we will be running below, we want to make a
population extinct at a certain timepoint. Which is why our ancient populations
in the starting script model have the `remove =` parameter specified.
:::

```{r}
#| fig-width: 6
#| fig-height: 6
plot_model(model, proportions = TRUE, samples = schedule)
```


### Part 1: Simulating a tree sequence and computing Tajima's D

Although the point of this exercise is to simulate selection, let's first
simulate a normal neutral model using slendr's `msprime()` engine as a sanity
check. **Simulate 10 Mb of sequence with a recombination rate `1e-8` and a
sampling `schedule` defined above.** Let's not worry about adding any mutations, just to change things up a little bit. We'll be working with
branch-based statistics here (which means adding `mode = "branch"` whenever
we will be computing a statistic, such as Tajima's D).

::: {.callout-note collapse="true" icon=false}
#### Click to see the solution

```{r}
ts <- msprime(model, sequence_length = 10e6, recombination_rate = 1e-8, samples = schedule)

ts # no mutations!
```
:::

**Inspect the table of all individuals recorded in our tree sequence using
the function `ts_samples()`, making sure we have all the individuals scheduled
for tree-sequence recording.** (Again, there's no such a thing as too many
sanity checks when doing research!)

::: {.callout-note collapse="true" icon=false}
#### Click to see the solution

```{r}
ts_samples(ts)

library(dplyr)
ts_samples(ts) %>% group_by(pop, time) %>% tally
```
:::


As you've already learned in an earlier exercise, _tskit_ functions in _slendr_
generally operate on vectors
(or lists) of individual names, like those produced by `ts_names()` above.
**Get a vector of names of individuals in every population recorded in the
tree sequence, then use this to compute Tajima's D using the _slendr_ function
`ts_tajima()`.** (Use the same approach as you have with `ts_diversity()` or
`ts_divergence()` above, using the list of names of individuals as the
`sample_sets =` argument for `ts_tajima()`). **Do you see any striking
differences in the Tajima's D values across populations? Check [this](https://en.wikipedia.org/wiki/Tajima%27s_D#Interpreting_Tajima's_D)
for some general guidance.**

::: {.callout-note collapse="true" icon=false}
#### Click to see the solution

```{r}
samples <- ts_names(ts, split = "pop")
samples

# Compute genome-wide Tajima's D for each population -- note that we don't
# expect to see any significant differences because no population experienced
# natural selection (yet)
ts_tajima(ts, sample_sets = samples, mode = "branch")
```
:::


## Part 2: Computing Tajima's D in windows

Let's take this one step forward. Even if there is a locus under positive selection
somewhere along our chromosome, it might be quite unlikely that we would find a
Tajima's D value significant enough for the entire chromosome (which is basically
what we did in Part 1 now). Fortunately, thanks to the flexibility of
the _tskit_ module, the _slendr_ function  `ts_tajima()` has an argument
`windows =`, which allows us to specify the coordinates of windows into which
a sequence should be broken into, with Tajima's D computed separately for each
window. Perhaps this will allow us to see the impact of positive selection
after we get to adding selection to our model. So let's first built some code
towards that.

**Define a variable `windows` which will contain a vector of coordinates of
100 windows, starting at position `0`, and ending at position `10e6` (i.e., the end
of our chromosome). Then provide this variable as the `windows =` argument of
`ts_tajima()` on a new, separate line of your script. Save the result of
`ts_tajima()` into the variable `tajima_wins`, and inspect its contents in the
R console.**

**Hint:** You can use the R function `seq()` and its argument `length.out = 100`,
to create the coordinates of window boundaries very easily.

::: {.callout-note collapse="true" icon=false}
#### Click to see the solution

```{r}
# Pre-compute genomic windows for window-based computation of Tajima's D
windows <- round(seq(0, ts$sequence_length, length.out = 100))
windows

# Compute genome-wide Tajima's D for each population in individual windows
tajima_wins <- ts_tajima(ts, sample_sets = samples, windows = windows, mode = "branch")
tajima_wins

# You can see that the format of the result is slightly strange, with the
# `D` column containing a vector of numbers (this is done for conciseness)
tajima_wins[1, ]$D
```
:::

The default output format of `ts_tajima()` is not super user-friendly. **Process
the result using a helper function `process_tajima(tajima_wins)` that I provided
for you (perhaps save it as `tajima_df`), and visualize it using another
of my helper functions `plot_tajima(tajima_df)`.**

::: aside
**Note:** Making the `process_tajima()` and `plot_tajima()` function available
in your R code is the purpose of the `source(here::here("files/popgen/popgen_utils.R"))` command
at the beginning of your script for this exercise.
:::

::: {.callout-note collapse="true" icon=false}
#### Click to see the solution

```{r}
# The helper function `process_tajima()` reformats the results into a normal
# data frame, this time with a new column `window` which indicates the index
# of the window that each `D` value was computed in
tajima_df <- process_tajima(tajima_wins)
tajima_df

# Now let's visualize the window-based Tajima's D along the simulated genome
# using another helper function `plot_tajima()`
plot_tajima(tajima_df)
```

It's no surprise that we don't see any Tajima's D outliers in any of our
windows, because we're still working with a tree sequence produced by our
a purely neutral simulation. But we have everything set up for the next part,
in which we will add selection acting on a beneficial allele.

:::


## Part 3: Adding positive selection to the base demographic model

Although primarily designed for neutral demographic models, _slendr_ allows
optional simulation of natural selection by providing a "SLiM extension code
snippet" with customization SLiM code as an optional argument `extension =`
of `compile_model()` (a function you're closely familiar with at this point).

Unfortunately we don't have any space to explain SLiM here (and I have no idea,
at the time of writing, whether or not you will have worked with SLiM earlier
in this workshop). Suffice to say that SLiM is another very popular population
genetic simulator software which allows simulation of selection, and which
requires you to write custom code in a different programming language called
Eidos.


**Take a look at the file `slim_extension.txt` provided in your working
directory (it's also part of the GitHub repository [here](https://github.com/bodkan/simgen/blob/main/slim_extension.txt)).
If you worked with SLiM before, glance through the script casually and see
if it makes any sense to you. If you have not worked with SLiM before,
look for the strange `{{elements}}` in curly brackets in the first ten lines
of the script.** Those are the parameters of the selection model we will be
customizing the standard neutral demographic model we started with in the next step.

Specifically, when you inspect the `slim_extension.txt` file, you can see
that this "SLiM extension script" I provided for you has three parameters:

- `origin_pop` -- in which population should a beneficial allele appear,
- `s` -- what should be the selection coefficient of the beneficial allele, and
- `onset_time` -- at which time should the allele appear in the `origin_pop`.

However, at the start, the SLiM extension snippet doesn't contain any concrete
values of those parameters, but only their `{{origin_pop}}`, `{{s}}`, and
`{{onset_time}}` placeholders.

**Use the _slendr_ function `substitute_values()` to substitute concrete values
for those parameters like this:**

```{r}
extension <- substitute_values(
  template = here::here("files/popgen/slim_extension.txt"),
  origin_pop = "EUR",
  s = 0.15,
  onset_time = 12000
)
extension
```

You can see that `substitute_values()` returned a path to a file. **Take a look
at that file in your terminal -- you should see each of the three `{{placeholder}}`
parameters replaced with a concrete given value.**

::: {.callout-note collapse="true" icon=false}
#### Click to see the solution

Let's take a look at the first 15 lines of the extension file before and
after calling `substitute_values()`. We'll do this in R for simplicity, but
you can use `less` in plain unix terminal.

**Before -- see the {{placeholder}} parameters in their original form:**
```{r}
#| echo: false
cat(paste(readLines(here::here("files/popgen/slim_extension.txt"))[1:11], collapse = "\n"))
```

**After -- see the {{placeholder}} parameters with concrete values!**
```{r}
#| echo: false
cat(paste(readLines(extension)[1:11], collapse = "\n"))
```
:::

And that's all the extra work we need to turn our purely neutral demographic
_slendr_ model into a model which includes natural selection! (In this case,
only a simple selection acting on a single locus, as you'll see later, but
this can be generalized to any imaginable selection scenario.)

How do we use the SLiM extension for our simulation? It's very simple -- we
just have to provide the `extension` variable as an additional argument of
good old `compile_model()`. This will compile a new _slendr_ model which will
now include the new functionality for simulating natural selection:

**Compile a new `model` of the history of populations `afr`, `ooa`, `ehg`,
etc., by following the instructions above, providing a new `extension =`
argument to the `compile_model()` function.**

::: {.callout-note collapse="true" icon=false}
#### Click to see the solution

```{r}
model <- compile_model(
  populations = list(afr, ooa, ehg, eur, ana, yam),
  gene_flow = gf, generation_time = 30,
  extension = extension   # <======== this is missing in the neutral example!
)
```
:::


## Part 4: Running a selection simulation using `slim()`

Now we can finally run our selection simulation!

There are two modifications to our previous simulation workflows:

1. Because we need to run a non-neutral simulation, we have to switch from using
the `msprime()` _slendr_ engine to `slim()`. The latter can still interpret the
same demographic model we programmed in R, just like the `msprime()` engine can,
but will run the model using SLiM (and thus leveraging the new SLiM extension code
that we have customized using `substitute_values()` above). We simply do this by
switching from this:

```{r}
#| eval: false
ts <- msprime(model, sequence_length = 10e6, recombination_rate = 1e-8, samples = schedule)
```

to this:

```{r}
#| eval: false
ts <- slim(model, sequence_length = 10e6, recombination_rate = 1e-8, samples = schedule)
```

As you can see, you don't have to modify anything in your model code, just
switching from `msprime` to `slim` in the line of code which produces the
simulation result.

2. The customized model will not only produce a tree sequence, but will
also generate a table of allele frequencies in each population (SLiM experts
might have noticed the revelant SLiM code when they were inspecting
[`slim_extension.txt`](https://github.com/bodkan/simgen/blob/main/slim_extension.txt)). We need to be able to load both of these files after
the simulation and thus need a path to a location we can find those files.
We can do this by calling the `slim()` function as `path <- slim(..., path = TRUE)`
(so with the extra `path =` argument). This will return a path to where the
`slim()` engine saved all files with our desired results.

**Run a simulation from the modified model of selection with the `slim()` engine
as instructed in points number 1. and 2. above, then use the `list.files(path)`
function in R to take a look in the directory. Which files were produced by
the simulation?**

::: {.callout-note collapse="true" icon=false}
#### Click to see the solution (you have a working SLiM installation)

```{r}
# tstart <- Sys.time()
path <- slim(model, sequence_length = 10e6, recombination_rate = 1e-8, samples = schedule, path = TRUE, random_seed = 59879916)
# tend <- Sys.time()
# tend - tstart # Time difference of 38.82014 secs

# We can verify that the path not only contains a tree-sequence file but also
# the table of allele frequencies.
list.files(path)
```

We can see that the `slim()` simulation generated a tree-sequence file (just
like in previous exercises focused on `msprime()`) but it also created a new
file -- this was done by the SLiM customization snippet we provided to
`compile_model()`.
:::

::: {.callout-note collapse="true" icon=false}
#### Click to see the solution (you don't have a working SLiM installation _or_ the simulation takes too long)

```{r}
# If you don't have SLiM set up, just use the simulated results from my own
# run of the same simulation
path <- here::here("files/popgen/selection")

# We can verify that the path not only contains a tree-sequence file but also
# the table of allele frequencies.
list.files(path)

ts <- ts_read(file.path(path, "slim.trees"), model)
```

We can see that the `slim()` simulation generated a tree-sequence file (just
like in previous exercises focused on `msprime()`) but it also created a new
file -- this was done by the SLiM customization snippet we provided to
`compile_model()`.
:::

## Part 5: Investigating allele frequency trajectories

**Use another helper function `read_trajectory(path)` which I provided for this
exercise to read the simulated frequency trajectories of the positively
selected mutation in all of our populations into a variable `traj_df`. Then
run a second helper function `plot_trajectory(traj_df)` to inspect the trajectories
visually.**

**Recall that you used the function `substitute_values()` to parametrize your
selection model so that the allele under selection occurs in Europeans 15 thousand
years ago, and is programmed to be under very strong selection of $s = 0.15$.
Do the trajectories visualized by `plot_trajectory()` make sense given the
demographic model of European prehistory plotted above?**

::: {.callout-note collapse="true" icon=false}
#### Click to see the solution

```{r}
traj_df <- read_trajectory(path)
traj_df

plot_trajectory(traj_df)

# Comparing the trajectories side-by-side with the demographic model reveals
# some obvious patterns of both selection and demographic history.
plot_grid(
  plot_model(model),
  plot_trajectory(traj_df),
  nrow = 1, rel_widths = c(0.7, 1)
)
```

We can see that the beneficial allele which appeared in the European population
was under _extremely strong selection_ (look how its allele frequency shoots
up immediately after its first appearance!). However, we can also se how
the following demographic history with multiple admixture events kept "diluting"
the allele frequency (indicated by the dips in the trajectory).

This is the kind of _slendr_ simulation which could be also very useful for simulation-based
inference, like we did in the previous exercise. Just imagine having a comparable
aDNA time series data with empirical allele frequency trajectory over time and
using it in an ABC setting!
:::

## Part 6: Tajima's D (genome-wide and window-based) from the selection model

Recall that your simulation run saved results in the location stored in the
`path` variable:

```{r}
list.files(path)
```

From this `path`, we've already successfuly investigated the frequency trajectories.

Now let's compute Tajima's D on the tree sequence simulated from our selection
model. Hopefully we should see an interesting pattern in our selection scan?
For instance, we don't know yet _where_ in the genome is the putative locus
under selection!

To read a tree sequence simulated with `slim()` by our customized selection setup,
we need to do a bit of work. To simplify things a bit, here's the R code which makes
it possible to do. Just copy it in your `selection.R` script as it is:

```{r}
# Let's use my own saved simulation results, so that we're all on the
# same page going forward
path <- here::here("files/popgen/selection")

ts <-
  file.path(path, "slim.trees") %>%  # 1. compose full path to the slim.trees file
  ts_read(model) %>%                 # 2. read the tree sequence file into R
  ts_recapitate(Ne = 5000, recombination_rate = 1e-8) # 3. perform recapitation
```

Very briefly, because our tree sequence was generated by SLiM, it's very likely
that not all genealogies along the simulated genome will be fully coalesced
(i.e., not all tree will have a single root). To explain why this is the case
is out of the scope of this session, but read [here](https://tskit.dev/pyslim/docs/latest/tutorial.html) if you're interested
in learning more. For the time being, it suffices to say that we can pass the
(uncoalesced) tree sequence into the `ts_recapitate()` function, which then
takes a SLiM tree sequence and simulates all necessary "ancestral history" that
was missing on the uncoalesced trees, thus ensuring that the entire tree
sequence is fully coalesced and can be correctly computed on.

**Now that you have a `ts` tree sequence object resulting from a new selection
simulation run, repeat the analyses of genome-wide and window-based Tajima's D
from _Part 1_ and _Part 2_ of this exercise, again using the provided helper
functions `process_tajima()` and `plot_tajima()`. Can you identify which locus
has been the likely focal point of the positive selection? Which population
shows evidence of selection? Which doesn't and why (look again at the
visualization of the demographic model above)?**

::: {.callout-note collapse="true" icon=false}
#### Click to see the solution

```{r}
samples <- ts_names(ts, split = "pop")
samples

# Overall Tajima's D across the 10Mb sequence still doesn't reveal any significant
# deviations even in case of selection (again, not entirely unsurprising)
ts_tajima(ts, sample_sets = samples, mode = "branch")
```


```{r}
# So let's look at the window-based computation again...
windows <- as.integer(seq(0, ts$sequence_length, length.out = 100))

# compute genome-wide Tajima's D for each population in individual windows
tajima_wins <- ts_tajima(ts, sample_sets = samples, windows = windows, mode = "branch")
tajima_df <- process_tajima(tajima_wins)

plot_tajima(tajima_df)
```

You should see a clear dip in Tajima's D around the midpoint of the DNA sequence,
but only in Europeans. The beneficial allele appeared in the European population,
and although the plot of the allele frequency trajectories shows that the selection
dynamics has been _dramatically_ affected by gene-flow events (generally causing
a repeated "dilution" of the selection signal in Europeans), there has never been
gene-flow (at least in our model) _from_ Europeans to other populations, so the
beneficial allele never had a chance to "make it" into those populations.

:::


:::::: {.callout-tip}
## Bonus exercises


#### Bonus 1: Examine the pattern of ancestry tracts along the simulated genome


::: {.callout-note collapse="true" icon=false}
#### Click to see the solution

```{r}
# tracts <- ts_tracts(ts, source = "ANA", target = "EUR")
```
:::


#### Bonus 2: Investigate the impact of recombination around the selected locus

Vary the uniform recombination rate and observe what happens with Tajima's D
in windows along the genome.

::: {.callout-note collapse="true" icon=false}
#### Click to see the solution

Solution: just modify the value of the `recombination_rate =` argument provided
to the `slim()` function above.
:::


#### Bonus 3: Simulate origin of the allele in EHG


Simulate the origin of the beneficial allele in the EHG population -- what
do the trajectories look like now? How does that change the Tajima's D
distribution along the genome in our European populations?

::: {.callout-note collapse="true" icon=false}
#### Click to see the solution

Use this extension in the `slim()` call, and repeat the rest of the
selection-based workflow in this exercise.

```{r}
#| eval: false
extension <- substitute_values(
  template = here::here("files/popgen/slim_extension.txt"),
  origin_pop = "EHG",
  s = 0.1,
  onset_time = 12000
)

model <- compile_model(
  populations = list(afr, ooa, ehg, eur, ana, yam),
  gene_flow = gf, generation_time = 30,
  extension = extension
)
```
:::


#### Bonus 4: Other statistics in windows

As a practice of your newly acquired tree-sequence computation skills with
_slendr_, calculate some other statistics in the same windows along the
simulated genome, visualize them yourself, and compare the results to the
window-based Tajima's D pattern. For instance, `ts_diversity()`, `ts_divergence()`,
or `ts_segregating()` might be quite interesting to look at.

::: {.callout-note collapse="true" icon=false}
#### Click to see the solution

Use the same tree sequence file you've computed Tajima's D on, and then
apply the functions `ts_diversity()`, `ts_divergence()`, and `ts_segregating()`
on that tree sequence.
:::


:::
<!-- End of Bonus exercises -->