- Preface (Stark)
- Introduction (Kitzes)
- Assessing the Reproducibility of a Research Project (Rokem, Marwick, Staneva)
- The Basic Reproducible Workflow Template (Kitzes, Turek)
- Introducing the Case Studies (Imamoglu, Turek)
- PART 1: High-Level Case Studies
- PART 2: Low-Level Case Studies
- Lessons Learned (Huff et al.)
- Supporting Reproducible Science (Ram, Marwick)
- Glossary of Terms and Techniques (Rokem, Chirigati)
Editors
Justin Kitzes, Fatma Imamoglu, Daniel Turek
Supplementary Chapter Authors
Philip Stark
Justin Kitzes
Daniel Turek
Fatma Imamoglu
Kathryn Huff
Karthik Ram
Ariel Rokem
Ben Marwick
Valentina Staneva
Fernando Chirigati
Case Study Chapter Contributors!
Mary K. Askren
Anthony Arendt
Lorena A. Barba
Pablo Barberá
Kyle Barbary
Carl Boettiger
You-Wei Cheah
Garret Christensen
Devarshi Ghoshal
Chris Gorgolewski
Jan Gukelberger
Chris Holdgraf
Konrad Hinsen
David Holland
Chris Hartgerink
Kathryn Huff
Fatma Imamoglu
Justin Kitzes
Natalie Koh
Andy Krause
Randy LeVeque
Tara Madhyastha
José Manuel Magallanes
Ben Marwick
Olivier Mesnard
K. Jarrod Millman
K. A. S. Mislan
Kellie Ottoboni
Gilberto Pastorello
Russell Poldrack
Karthik Ram
Ariel Rokem
Rachel Slaybaugh
Valentina Staneva
Philip Stark
Daniel Turek
Daniela Ushizima
Zhao Zhang
## Lessons Learned
- Pain Points
- Recommmendations from the Authors
- A Little Data
- Needs
Pain Points
People and Skills
Dependencies, Build Systems, and Packaging
Hardware Access
Testing
Publishing
Data Versioning
Time and Incentives
Data restrictions
## Incentives
- verifiability
- collaboration
- efficiency
- extensibility
- "focus on science"
- "forced planning"
- "safety for evolution"
## Recommendations
- version control your code
- open your data
- automate everywhere possible
- document your processes
- test everything
- use free and open tools
## Recommendations: Continued
- avoid excessive dependencies
- when dependencies can't be avoided, package their installation
- host code on a collaborative platform (e.g. GitHub)
- get a Digital Object Identifier for your data and code
- avoid spreadsheets, plain text data is preferred ("timeless," even)
- explicitly set pseudorandom number generator seeds
- workflow and provenance frameworks may be too clunky for most scientists
## Recommendations: Outliers
> ... in our estimation, if someone
> was to try to reproduce our research it would probably be more
> natural for them to write their own scripts as this has the
> additional benefit that they might not fall into any error
> we may have accidentally introduced in our scripts.
## Recommendations: Outliers
> Scientific funding and the number of scientists available to do the work is finite. Therefore not every scientific result can, or should be reproduced.
Emergent Needs
Better education of scientists in more reproducibility-robust tools.
Widely used tools should be more reproducible so that the common denominator tool does not undermine reproducibility.
Improved configuration and build systems for portably packaging software, data, and analysis workflows.
Reproducibility at scale for high performance computing.
Standardized hardware configurations and experimental procedures for limited-availability experimental apparatuses.
Better understanding of why researchers don't respond to the delayed incentives of unit testing as a practice.
Greater adoption of unit testing irrespective of programming language.
Broader community adoption around publication formats that allow parallel editing (i.e. any plain text markup language that can be version
Greater scientific adoption of new industry-led tools and platforms for data storage, versioning, and management.
Increased community recognition of the benefits of reproducibility.
Incentive systems where reproducibility is not self-incentivizing.
Standards around scrubbed and representational data so that analysis can be investigated separate from restricted data sets.
Community adoption for file format standards within some domains.
Domain standards which translate well outside of their own scientific communities.