Skip to content

Commit

Permalink
Add section on modes of parallelization.
Browse files Browse the repository at this point in the history
Fix spelling errors and add palmer penguins logo.
  • Loading branch information
mbjones committed Oct 25, 2023
1 parent 988b89b commit 2d965bd
Show file tree
Hide file tree
Showing 4 changed files with 21 additions and 2 deletions.
Binary file added materials/images/penguins-logo.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added materials/images/serial-parallel-exec.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
21 changes: 20 additions & 1 deletion materials/sections/parallel-computing-in-r.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -122,6 +122,20 @@ Finally, maybe one of these [NSF-sponsored high performance computing clusters (

Note that these clusters have multiple nodes (hosts), and each host has multiple cores. So this is really multiple computers clustered together to act in a coordinated fashion, but each node runs its own copy of the operating system, and is in many ways independent of the other nodes in the cluster. One way to use such a cluster would be to use just one of the nodes, and use a multi-core approach to parallelization to use all of the cores on that single machine. But to truly make use of the whole cluster, one must use parallelization tools that let us spread out our computations across multiple host nodes in the cluster.

## Modes of parallelization

Several different approaches can be taken to structuring a computer program to take advantage of the hardware capabilities of multi-core processors. In the typical, and simplest, case, each task in a computation is executed serially in order of first to last. The total computation time is the sum of the time of all of the subtasks that are executed. In the next figure, a single core of the processor is used to sequentially execute each of the five tasks, with time flowing from left to right.

![Serial and parallel execution of tasks using threads and processes.](images/serial-parallel-exec.png)

In comparison, the middle panel shows two approaches to parallelization on a single computer: Parallel Threads and Parallel Processes. With **multi-threaded** execution, a separate thread of execution is created for each of the 5 tasks, and these are executed concurrently on 5 of the cores of the processor. All of the threads are in the same process and share the same memory and resources, so one must take care that they do not interfere with each other.

With **multi-process** execution, a separate process is created for each of the 5 tasks, and these are executed concurrently on the cores of the processor. The difference is that each process has it's own copy of the program memory, and changes are merged when each child process completes. Because each child process must be created and resources for that process must be marshalled and unmarshalled, there is more overhead in creating a process than a thread. "Marshalling" is the process of transforming the memory representation of an object into another format, which allows communication between remote objects by converting an object into serialized form.

Finally, **cluster parallel** execution is shown in the last panel, in which a cluster with multiple computers is used to execute multiple processes for each task. Again, there is a setup task associated with creating and mashaling resources for the task, which now includes the overhead of moving data from one machine to the others in the cluster over the network. This further increases the cost of creating and executing multiple processes, but can be highly advantageous when accessing exceedingly large numbers of processing cores on clusters.

The key to performance gains is to ensure that the overhead associated with creating new threads or processes is small relative to the time it takes to perform a task. Somewhat unintuitively, when the setup overhead time exceeds the task time, parallel execution will likely be slower than serial.

## When to parallelize

It's not as simple as it may seem. While in theory each added processor would linearly increase the throughput of a computation, there is overhead that reduces that efficiency. For example, the code and, importantly, the data need to be copied to each additional CPU, and this takes time and bandwidth. Plus, new processes and/or threads need to be created by the operating system, which also takes time. This overhead reduces the efficiency enough that realistic performance gains are much less than theoretical, and usually do not scale linearly as a function of processing power. For example, if the time that a computation takes is short, then the overhead of setting up these additional resources may actually overwhelm any advantages of the additional processing power, and the computation could potentially take longer!
Expand Down Expand Up @@ -153,10 +167,15 @@ ggplot(cpu_perf, aes(cpus, speedup, color=prop)) +

So, its important to evaluate the computational efficiency of requests, and work to ensure that additional compute resources brought to bear will pay off in terms of increased work being done. With that, let's do some parallel computing...

## Pleasingly Parallel task lists
## Pleasingly Parallel with Palmer Penguins

::: {layout-ncol="2"}

When you have a list of repetitive tasks, you may be able to speed it up by adding more computing power. If each task is completely independent of the others, then it is a prime candidate for executing those tasks in parallel, each on its own core. For example, let's build a simple loop that uses sample with replacement to do a bootstrap analysis. In this case, we select `bill_length_mm` and `species` from the `palmerpenguins` dataset, randomly subset it to 100 observations, and then iterate across 3,000 trials, each time resampling the observations with replacement. We then run a logistic regression fitting species as a function of length, and record the coefficients for each trial to be returned.

![](images/penguins-logo.png)
:::

```{r}
library(palmerpenguins)
library(dplyr)
Expand Down
2 changes: 1 addition & 1 deletion materials/session_09.qmd
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
---
title: "Parellel Processing"
title: "Parallel Processing"
title-block-banner: true
---

Expand Down

0 comments on commit 2d965bd

Please sign in to comment.