Docker image for running R code in OpenSAFELY, both locally and in production.
- docker
- docker-compose
- just
And the tests additionally require
- curl
- python3
just build VERSION
where VERSION
is either v1 or v2.
Under the hood, this builds VERSION/Dockerfile
using docker-compose and buildkit.
In v1, we currently build a lot of packages, so an initial build on a fresh checkout can take a long time (e.g. an hour). However, to alleviate this, the v1/Dockerfile is carefully designed to use local buildkit cache, so subequent rebuilds should be very fast.
In v2, where possible we install binary R packages for Linux from the Posit Public Package Manager (PPPM). And we use the pak package to install packages. This has several advantages including parallel downloads of packages. Therefore, building the v2 image only takes approx. 5 minutes, which is orders of magnitude faster than building the v1 image.
- Enough bandwidth to comfortably push potentionally gigabytes worth of Docker layers.
- (Under v1) Several hours worth of CPU time to re-compile all the packages (if this is the first time you've done this and don't have them cached locally).
- Push access to ghcr.io.
If you don't have all these things then please don't start.
Before adding a package, check with an OpenSAFELY team member with R experience to approve the package.
To add a package, by default it will be installed from CRAN.
just add-package-v1 PACKAGE
If you need to install a package from another CRAN-like repository, specify its URL as the REPOS argument.
just add-package-v1 PACKAGE REPOS
This will attempt to install and build the package and its dependencies, and update the v1/renv.lock. It will then rebuild the R image with the new lock file and test it.
Note that the first time you do this it will need to compile every included R package (because you won't have the R package builds cached locally). This can take several hours. (When we solve the caching problem here we'll be able to do this all in CI.)
Add a new section for the new package/s to v2/packages.toml. If all the packages are from CRAN then the section should be structured as follows.
[relevant-section-title]
packages = ["package-name-1", "package-name-2"]
comment = "Explanatory comment about why the package/s are being added."
If the package is not on CRAN please add it to the https://opensafely-core.r-universe.dev by adding it to packages.json in the registry repository https://github.com/opensafely-core/opensafely-core.r-universe.dev, then enter the relevant Linux binary package URL, as an additional repos
key-value pair in the new section in v2/packages.toml, currently this is done as follows.
repos = "https://opensafely-core.r-universe.dev/bin/linux/noble/4.4/"
If the package requires any runtime dependencies add those to v2/dependencies.txt
Then build the v2 image.
just build v2
You will need to configure authentication to GitHub's container registry first. See GitHub's documentation.
When you have authentication configured, run:
just publish VERSION
Commit and push the small resulting change (should only be a few extra lines under v1 in v1/packages.csv, v1/packages.md, and v1/renv.lock; and under v2 in v2/packages.toml, v2/packages.md, and v2/pkg.lock) to a branch, then get the changes merged via pull request.
The review is a trivial exercise because the Docker image has already been pushed to GitHub.
The updated image will need pulling into production. This is covered
separately in the tech team manual. If you don't have access, ask in
#tech
.
If the package requires any system build dependencies (e.g. -dev packages with
headers), they should be added to VERSION/build-dependencies.txt
. If it requires
runtime dependencies, they should be added to VERSION/dependencies.txt
. Packages
don't advertise their system dependencies, so you may need to figure them out
by trying to add the package and reading any error output on failure.
If the package still fails to build, you may be able to install an older version.
Find a previous version at https://cran.r-project.org/src/contrib/Archive/{PACKAGE}/
, and attempt to install it specifically with
just add-package-v1 PACKAGE@VERSION
The rstudio image is based on the r image including rstudio-server. To build run
just build-rstudio VERSION
To test that rstudio-server appears at http://localhost:8787
run
just test-rstudio VERSION
And then push the new rstudio image to the GitHub container registry with
just publish-rstudio VERSION
In v2, we choose a date from which to install the packages from CRAN, we strongly recommend that the version of R in the image was the release version of R on this date. R release dates can be found on the R wikipedia page.
In v2, when installing packages we use a Posit Public Package Manager (PPPM) snapshot repository on the chosen CRAN_DATE
.
We use a fixed date because CRAN follows a rolling release model. As such we know that on a particular date CRAN has tested these package versions with the release version of R. Hence this is an extremely stable approach to choosing a set of package versions. And we can add additional packages at their versions on this date reliably (and without updating dependency packages already included in the image).
The CRAN apt repository for R is available here (note you may need to amend the Ubuntu codename in the URL if using a newer base image), find the package number you require and edit the number in v2/dependencies.txt and v2/build-dependencies.txt.
Then amend the CRAN_DATE
and REPOS
arguments in v2/env.
To update run
just build v2
To test the updated image run
just test v2
Choose a version of R.
Choose a CRAN date when that version of R.
We follow a very similar approach to the versioned stack of the Rocker project. They list their R versions and CRAN dates on their wiki.
We recommend not choosing a date within the first week of a new version of R being released, because there may be alot of packages updated on CRAN during this time.
You then need to check that a PPPM snapshot repository exists for your chosen date. Navigate to https://p3m.dev/client/#/repos/cran/setup and inspect your chosen date. Set this date as the REPOS
argument in v2/env.
If you choose a version of R that is not the current version of R we recommend following the Rocker approach and choosing the CRAN date as the day before the next version of R was released. For example, if choosing R 4.4.1, R 4.4.2 was released on 2024-10-31 therefore we would choose 2024-10-30 as the CRAN date. Or as is the case here we are using the current version of R (4.4.2) therefore we choose the latest available date on PPPM as the CRAN date.
You can find out when the next release of R is scheduled for on the R developer page.
We set the HTTPUserAgent
in the appropriate places so that we obtain binary R packages for Linux from the PPPM. There is additional information about this on the PPPM website.
In v2, compared to v1, several packages have either been superseeded by other packages or have been removed from CRAN. These include dummies, maptools (if required terra could be provided as a replacement), mnlogit, rgdal, and rgeos (the sf package is still included which acts as a replacement for rgdal and rgeos). Several additional packages such as sjPlot have been provided due to requests.