On Github sallain / tcdl-data-archivist
The role of the archivist in research data management
Basic intro to Archivematica
Three case studies:
Ontario Council of University Libraries + Dataverse University of York & University of Hull + Hydra Compute Canada + GlobusWe've been thinking about the role of the library in research data management for several years.
Digital management platforms must adequately preserve data.
Domain-specific tools and proprietary formats make this difficult.
Research data management is a digital preservation problem.
Archivists are pretty good at digital preservation.
Web- and standards-based open-source application which allows your institution to preserve long-term access to trustworthy, authentic and reliable digital content.
Based on standards and best practices
Format and repository agnostic
Small enough to run on a laptop
Robust enough to handle petabytes of data
Modular
Free and open source
Familiar
It was built around archival standards, using archival terminology, and it's meant to anticipate archival digital preservation workflows. (Of course, everyone's welcome to use it!)
Luckily, since RDM is a digital preservation problem, it's well suited to RDM workflows as well.
Jisc-funded projects aimed at encouraging tool and workflow development to tackle various aspects of research data management.
Available project funding was anywhere from £250k to £1m.
York and Hull were successful at obtaining funding for all three phases of the project.
Goal was to take advantage of Archivematica's modularity to integrate Archivematica into a research data management architecture that would include other applications for deposit, management, etc.
Established Hydra-based institutional repository, but no digital preservation capacity.
Wanted to be able to offer assured long-term preservation to faculty members.
After Phase 1 (testing), the archivists at York and Hull identified several areas where Archivematica was not sufficient to meet their RDM needs.
They applied for Phase 2 funding to begin developing solutions for the identified problems.
Five deliverables:
York and Hull successfully applied for Phase 3 funding to build a proof-of-concept platform, making use of the deliverables to integrate Archivematica with Hydra.
Meanwhile, Artefactual is currently bundling the new features into the 1.5 and 1.6 releases of Archivematica.
Open source repository platform developed at Harvard.
Ontario Council of University Libraries' tech branch, Scholars Portal, hosts a Dataverse instance that is available to academics at Ontario's 21 universities.
Dataverse excels as a deposit and access system, but has limited digital preservation functionality.
Goal of the project was to let users deposit content through Dataverse, running Archivematica preservation tasks in the background.
Important: users can deposit content over time, rather than all at once!
The integration makes use of Automation Tools, an Archivematica library that facilitates requests for updated information from Dataverse's API. An ingest script was also developed to manage ingest tasks.
The Dataverse integration project resulted in a proof of concept workflow that isn't currently scheduled for release. However, it's available as a separate public branch of the project on Github.
At some point in the future, we would love to generalize the code and make it available in a public release.
A national, non-profit organization that provides high performance research computing resources for 70 institutions and 10,000+ researchers.
Compute Canada uses Globus' Transfer Service and Publication Service tools to store and provide access to research data.
Scholars Portal holds terabytes of climate data from the CPDN. This corpus was used to pilot an integration where Archivematica acts as a bridge between the Globus Transfer and Publication Services and Compute Canada datastores.
This proof of concept is also not scheduled for release. We're working on getting it into a separate public branch of the Archivematica project on Github.
Twitter: @archivalistic | @archivematica
Email: sallain@artefactual.com or info@artefactual.com
This presentation: bit.do/data-archivist