c4lbc14-fits



c4lbc14-fits

0 0


c4lbc14-fits

FITS optimization presentation for Code4lib BC 2014

On Github mistydemeo / c4lbc14-fits

hi

http://archivematica.org/

FITS

http://fitstool.org/

FITS

  • File ID
  • Metadata extraction

VisualVM

Main bottlenecks

  • JVM lag
  • DROID
  • MD5 checksum calculation
  • JHOVE
  • XSLT compilation

JVM lag

  • Time to start up a fresh Java VM
  • Happens every time you run "fits.sh"
  • Depends on computer, but in measurements was between 0.5s - 10s

JVM lag

Example

  • Transfer containing 17,000 files
  • Average 2s JVM lag per file

9.4 hours of time wasted!

Solution: nailgun

Source: Lance Fisher, Flickr (https://flic.kr/p/c9Qpn)

CC-BY-SA https://creativecommons.org/licenses/by-sa/2.0/

JVM server

  • Maintains a persistent JVM with a class loaded
  • Allows CLI tools to be run with zero startup lag
  • We contributed a nailgun server startup script to FITS 0.8.0

DROID

DROID

  • FITS 0.6.x used DROID 3
  • Slow startup due to XML parser
  • Spent more time parsing XML than IDing files!

DROID

  • Archivematica switched to using FIDO
  • FITS has since upgraded to DROID 6, which is faster

MD5

MD5

  • FITS always calculated MD5 for every file
  • 10%+ of the time spent on large files
  • Archivematica never used it!

MD5

  • Submitted a change that makes this configurable
  • Included in FITS 0.8.0

XSLT / XML

XSLT / XML

  • FITS uses XSLT to translate tool output to its format
  • XSLT is compiled on startup

XSLT / XML

  • This only needs to happen once
  • But cache is thrown away when FITS quits
  • Nailgun fixes this too!

Thanks!

Misty De Meo

mdemeo@artefactual.com