On Github ablwr / free_your_workflows
Ashley Blewer, NYPL && Dinah Handel, CUNY TV
open sourcing audio visual archiving and preservation workflows, software, and file formatsopen sourcing audio visual archiving and preservation workflows, software, and file formats
[Introductions] Hi I'm Ashley this is Dinah, blahblahblah, Dinah is going to start off this talk by talking about open sourcing audio visual archiving and preservation workflows, software, and file formats in THEORY.Librarians and archivists love good documentation.
First, librarians and archivists try to create, and think it is important to have, clear, concise, good documentation of workflows. A large part of many librarians and archivists work is to figure out how something is done (how materials are accessioned, processed, or digitized for example), and translate those steps- what we consider a "workflow," into a piece of documentation.Librarians and archivists care about making information accessible.
The second assumption is that librarians and archivists care about making information accessible to others. This is sort of a fundamental principle of librarianship, and although it certainly becomes more complicated in practice, we typically describe the library and archive work that we do in service of access.These two concepts often are not paired together.
Yet, often, these two things don’t go hand and hand with each other. We’re more concerned, and rightly so, with making the materials we have in our collections accessible to patrons, rather than sharing how we made the material accessible. Occaisionally, we might make presentations, write blog posts or journal articles, or even make public versions of internal workflows available on our library websites. But overall there isn't an infrastructure or expectation in place for sharing the nitty gritty details of library and archival workflows.Why the gap between theory and practice?
"... but it’s not our fault!!!" We all want to do the best thing, but some things get in the way.(but, like, actually writing them though)
Writing documentation is hard! It’s a real skill to bridge the gap between a code base and human comprehension.open workflows
To contextualize the rest of our talk, it might be good to lay down some definitions, so that we’re all on the same page when we’re talking about these technologies. Each of these terms or concepts that I’ll talk about play an important role in opening sourcing workflows.open source
First, what do we mean by open source? The open source initiative defines open source software as “software that can be freely used, changed, and shared (in modified and unmodified form) by anyone.”open access
What we mean by open access is unrestricted access to information. open access is commonly used in the world of academic publishing to denote a work that is freely available to read without subscription to journals or databases. The same applies to code- people have to be able to find and download the source code for free.open file formats
Open file formats are a third significant component. A file format defines the structure and type of data that is stored in a file. An open file format, then, is the accessible, published specification for the structure and storage of data in a file. Open file formats are usually maintained by a standards organization. An open file format is platform independent, meaning that it can be used across software and operating systems. Using an open file format with publicly available specifications is necessary in digital preservation, as it allows for us to continue to render digital objects as operating systems and software changes, which is crucial in combating obsolescence. In particular, audio visual file formats contain a lot of information about the file, and with proprietary file formats it makes future playback difficult.microservices
A micro-services framework breaks down extensive, multi-step processes into distinct pieces. Each micro service accomplishes one discrete task. I like to think about micro-services as modular code that can be combined in numerous ways depending on the desired outcome.Microservices (at institutional level)
To bring this around to practice, implementation can be translated across institutions. At NYPL, on my team, we really strive for a microservices approach to development, and I'm speaking about microservices in a larger context than our primary definition in this talk. We want to have the ability to switch out any one part of a project for something better that comes along, and not have everything tangled up in one, overly large, bloated application. (I know y'all know what I'm talking about) And we do this by having clear end-points to send data from one place to another. BUT What better way to know that one software component of your overall workflow can be swapped out in the future than if it can also easily be swapped into another institution's code base?git-init
I think one of the most promising tools is git, either from the command line or through some graphical user interface. I work with Github a lot, and I see it as a possible centralized tool for hosting workflow documentation from different institutions. Sometimes, I think something like github gets seen by archivists as irrelevant to the work that they do because they don’t write code, but the github website doesn’t only have to be a repository for code. it could also contain documentation of workflows, such as PDFs or readmes in markdown that outline an institution’s workflow and specifications. A benefit of using github for workflow and policy documentation is version control- it allows for the public to see how and why a workflow or policy has changed, which provides greater transparency into archival policy decisions land labor. Some libraries and archives already do this, and it's awesome to see.Neatrour, Anna and Wolcott, Liz. (2015, November 24). Library Workflow Exchange: Sharing Library Innovation [blog post]. Retrieved from https://www.diglib.org/archives/10844/
Library workflow exchange is not really a tool, but it is a space that I find exciting, with regards to sharing knowledge about library and archival workflows. the two librarians- Liz Woolcott and Anna Neatrour, that started library workflow exchange share the same dream that I do- quote "a magical database that would allow us to find out what other libraries are doing to automate workloads" and so they started the library workflow exchange. there’s options to self-submit your workflow for institutions that don't have the availability to host on institutional websites, and the site quote "pulls workflows from websites, blogs, conference presentations, Github, and a host of other places".media microservices
media microservices are a set of open source micro service scripts that I’ve been working on as part of my time as a national digital stewardship resident at CUNY television. media microservices are written in bash, and perform much of the labor of processing digital media, so they essentially are the archiving and preservation workflow, and much of our documentation of how they work or what they do is stored in the comments within the code. as Ashley noted earlier, what’s useful about implementing a microservices approach is that we can make modifications to individual processes without having to overhaul the entire workflow, and add in new functionality when needed. With microservices, we also aren’t restricted to one workflow imposed by a software system, which makes it easier to adapt as technology changes. This is especially important with AV materials, as digital preservation software doesn't always have the functionality, or isn't optimized to deal with complex and large audio visual files. While media microservices are developed for our institutional needs, but they also work just as individual services - the gif I'm showing here is of the makeyoutube microservice, which transcodes one or multiple inputs according to the specifications for upload to youtube. if an institution wants to use all of the media microservices, they can as they come with a configuration file that can be set up to process and deliver based on local needs, and there’s comprehensive instructions as part of the readme.vrecord
another example of an open source av workflow is vrecord, which is open source audio visual digitization software. vrecord was initially created at cuny television, following a hardware and software update that made the proprietary software final cut pro unusable for digitization. vrecord is downloadable via git hub and homebrew, and has grown into a project that is worked on by many individuals at various institutions. vrecord uses the open source software ffmpeg and works with black magic design's open source software development kit. While vrecord doesn’t solve the problem of expensive and difficult to obtain hardware needed for digitization, it does make digitization more accessible.QCTools is a project I work on coming from Bay Area Video Coalition.
It’s software that helps detect problems in digitized analog video. QCTools is for running quality control/analysis on these videos for errors, which is great for doing inspections on video after digitization or after coming back from a vendor. QCTools works for single files right now but has recently received grant funding and support from Indiana University to allow this to work as a microservice, and add a database and web server for batch-level processing and analysis.
MediaConch is another project that I work on. It’s for video file conformance checking. It’s so your video files are what they say they are. This is funded by the European Commission, and the software is required to be open source.
MediaConch is based out of MediaInfo, which is… I’d say the most-used media microservice among information professionals. MediaInfo gives you information about your files! But more importantly, it does it very quickly, and even works on partial files that are in transit. So it’s very easy to integrate MediaInfo into much larger projects (and this is what we intend to do with MediaConch too). So not only do you get information about your files, but you know if those files are happy and healthy and conforming to your institution's policy.
An example of integration of these tools is within Artefactual’s Archivematica, which uses mediainfo and will soon be using mediaconch, as part of the suite of services that get files into digital long-term preservation-level storage.
MediaConch and Artefactual use Matroska, which is an open video format currently going through the standardization process via the Internet Engineering Task Force's CELLAR (Codec Encoding for LossLess Archiving and Realtime transmission) working group. This is to really ensure their longevity as a recommended digital preservation file format.
ffmprovisr!
ffmprovisr is a good example of a small project that helps many people at a lot of different institutions, and has had work contributed by people at many institutions. It’s a platform for sharing very small scripts -- FFmpeg scripts. It's hosted on the AMIA Open Source committee's github page. There’s been a lot of collaboration in the Issues section and through pull requests, even with contributions from FFmpeg contributors. It's great to see so many people come together and share knowledge.
A/VAA is a wiki for sharing video playback problems where people can contribute their video playback errors, explain or describe problems, or figure out why their videos look weird.
A/VAA is a wiki for sharing video playback problems where people can contribute their video playback errors, explain or describe problems, or figure out why their videos look weird.
... and the rest will follow ...
So free your workflows and the rest will follow!
Thanks!
@dericed, @ndsr, @nypl, FOSS contributors
... and the rest will follow ...