On Github danielghurley / reference_environments
Daniel Hurley
This is a reveal.js presentation. Press Space to advance the slides, or use the arrow keys or the mouse. Press O to zoom out and Space or Left-Mouse to zoom back in. If you're on a phone, you can probably swipe too.
We are making assertions about the relationship between some data, some code (software objects) and some results
We are claiming that others should be able to follow what we have done, and come to similar results
Even if... they differ in the conclusions drawn from those results
There are a lot of reasons why this might not be easy
Computational research is hard work to reproduce1,2
Many publications still don't provide code or data3
Those that do may not contain enough detail to reproduce results4
Both are important..
and ideally we build a snapshot from the recipe
Reference environments generally include a basic operating system, libraries, tools and data
From a single set of installation/provisioning scripts, we can produce them in different formats
As virtual machines As containers As cloud imagesBecause people use different tools and different environments and this is OK
'Single-sourcing' environment configuration means we can deliver reproducible research to the broadest possible audience
We don't have to be limited by 'who uses Docker', or 'who has access to cloud infrastructure'
What about the challenges we discussed at the beginning?
These are all genuine challenges
But they all have mitigating strategies
And they are barriers to all reproducibility of research, not to reference environments alone