Reference environments for reproducible research



Reference environments for reproducible research

0 0


reference_environments


On Github danielghurley / reference_environments

Reference environments for reproducible research

Daniel Hurley

This is a reveal.js presentation. Press Space to advance the slides, or use the arrow keys or the mouse. Press O to zoom out and Space or Left-Mouse to zoom back in. If you're on a phone, you can probably swipe too.

Topics

  • What are we doing when we publish?
  • But: problems with reproduction
  • 'Reproducible research': two concepts
  • A simple solution
  • Examples of reference environments
  • Some challenges
  • Conclusion

What are we doing when we publish a (peer-reviewed, computational) paper?

One way to look at it

  • We are making assertions about the relationship between some data, some code (software objects) and some results

  • We are claiming that others should be able to follow what we have done, and come to similar results

  • Even if... they differ in the conclusions drawn from those results

But

There are a lot of reasons why this might not be easy

Here are some classics

Dependency problems Configuration issues Environment differences Access to data Licensing Architecture Scale

The consequences are..

Computational research is hard work to reproduce1,2

Many publications still don't provide code or data3

Those that do may not contain enough detail to reproduce results4

Some responses

There are a lot of tools to enable 'reproducible research'

'Recipe' 'Snapshot' DescriptionLiterate programmingSweave/knitriPythonCloud environments Archiving resultsRead-onlySelf-contained

And there are two conceptions of 'reproducible research'

Both are important..

 and ideally we build a snapshot from the recipe

A simple stepto improve reproducibility

  • For every research output, we can produce a minimal software stack required to reproduce some or all of the results
  • We call this a reference environment

Reference environments generally include a basic operating system, libraries, tools and data

From a single set of installation/provisioning scripts, we can produce them in different formats

As virtual machines As containers As cloud images

Why?

Because people use different tools and different environments and this is OK

'Single-sourcing' environment configuration means we can deliver reproducible research to the broadest possible audience

We don't have to be limited by 'who uses Docker', or 'who has access to cloud infrastructure'

How does it work?

Name Technologies used Locations Bond graph modelling ofbiochemical networks OctaveLaTeK Vagrant VMDocker containerBootable ISO image Network link prediction MATLAB Vagrant VMDocker containerBootable ISO image Network deconvolution MATLAB Vagrant VMDocker containerBootable ISO image Machine learning approachesto modelling eukaryotic transcription R Vagrant VMDocker containerBootable ISO image Hormonal regulation of renalexcretion ROCaml Vagrant VMBootable ISO image Parallel data mining using WEKA Java Vagrant VMDocker containerBootable ISO image

Example reference environments

But (again)

What about the challenges we discussed at the beginning?

  • Dependency problems
  • Configuration issues
  • Environment differences
  • Access to data
  • Licensing
  • Architecture
  • Scale

These are all genuine challenges

But they all have mitigating strategies

And they are barriers to all reproducibility of research, not to reference environments alone

Concluding thoughts

  • A result that is hard to reproduce is not a weak or trivial result but it is hard to reproduce
  • Partial reproducibility is valuable
  • The issue of reproducibility is ultimately one of dissemination and communication

Further resources

A detailed guide to using and making reference environments

Systems Biology Lab website

Daniel Hurley's website

Back to start

1
Reference environments for reproducible research Daniel Hurley This is a reveal.js presentation. Press Space to advance the slides, or use the arrow keys or the mouse. Press O to zoom out and Space or Left-Mouse to zoom back in. If you're on a phone, you can probably swipe too.