-
Notifications
You must be signed in to change notification settings - Fork 2
R5 principles
Benureau and Rougier in their work from 2018 defined five characteristics that a scientific code should possess in order to be regarded as a first-class research output. These five characteristics translate into principles that those who write scientific code should follow. In short, a scientific code should be: 1. Re-runnable (R1) 2. Repeatable (R2) 3. Reproducible (R3) 4. Reusable (R4) 5. Replicable (R5)
The above five terms were initially coined by in the presentation "What is Reproducibility? - The R* brouhaha" from 2016.
Although these characteristics are tailored to scientific code development, they can still be very useful applied to other (research) digital objects such as data per se and workflows.
Readers will find some level of overlap with Open Science and FAIR data principles but should note that the goal is to provide a different perspective on the same problem - namely, how research outputs can maintain value over the long term.
The code should be re-runnable (i.e., executable)
We often face the issue that our code which initially worked perfectly in a year's time cannot be executed. Typically, you will end up updating libraries, reinstalling the operating system, updating the compiler, changing the computer or some of its components, etc. all of which will result in a failure to execute the code. It pretty impossible to create a future-proof code, however, it is possible to recreate and save the old execution environment and thus assure that the code will be re-runnable. To do this we can create a virtual machine, which replicates the old execution environment, deploy the code inside of it, and then simply share/publish the virtual machine for anyone to use it. Docker containers are a popular choice for such work.
The code should produce the same results every time it is executed
This principle assures that the code produces the same output over successive runs. One way to achieve this is to test the code on a predetermined set of inputs for which the resulting output is known.
The code should allow those who use it to reobtain the published results
Even if we achieve R1 and R2 the code supplied with a virtual machine when executed on other computers probably will not generate absolutely the same results. This is due to the fact that different computers will have different hardware components, so execution of the code will generate slightly different results, so some tolerances should be set on the output results which should be indicators whether the code is reproducible. To fully achieve R3 one would not only need a copy of the execution environment but also a copy of the old hardware.
The code should be easy to use, understand and modify
Every year, countless hours and significant resources are lost because of poorly written code. Making your program reusable means it can be easily used, and modified, by you and other people, inside and outside your lab. To meet this principle documenting code is essential, while writing code that is clear and readable.
Almost all programming languages are following certain standards in the way how code documenting should be done, chose one of them do not try to come up with your own standard. The code documentation is far more detailed and focused description of your code comparing to often chaotic code comments seeded in the source code. If possible write code such that it does not require comments to be understood.
A clear and unambiguous algorithmic description of the code should available as a reference allowing the code replication in other programming languages than one chosen for an initial algorithm implementation
Replicability is the implicit assumption that an article that does not provide the source code makes: that the description it provides of the algorithms is sufficiently precise and complete to re-obtain the results it presents. Here, replicating implies writing a new code matching the conceptual description of the article, in order to reobtain the same results. One way of writing the algorithmic description is to use pseudo-code that can be translated to any programming language.
Source: R5 principles - Digitalization and Open Science, URL: https://like-itn-digitalization.readthedocs.io/en/latest/4_R5/
Sprint Planning
|
Sprint Review
Wikis:
data
|
model-factory
|
opendrr-api
|
opendrr
|
python-env
|
riskprofiler-cms