aorra-eresearch-2013



aorra-eresearch-2013

1 0


aorra-eresearch-2013


On Github tjdett / aorra-eresearch-2013

AORRA Automating Reef Report Cards

Tim Dettrick, Andre Gebers, Jane Hunter The University of Queensland

Part of the larger

eReefs
Project
Note: This browser doesn't support this presentations 3D transforms, and probably won't support the WebM videos later on.
- Name - Software engineer at UQ eResearch Lab working on the AORRA project. The goal of the AORRA project: "improve the delivery of annual reports relating to the health of the Great Barrier Reef". automating repetitive manual steps → reduce delays and improve the quality

“Reef Report Card”?

So, a quick background on what these reef report cards are... - Annual economic benefit of Great Barrier Reef → ~$5 billion - Water pollution - major threat to Great Barrier Reef. - 40 catchments across 400 thousand square kilometers - waterways in the catchment area contain significant sediment and chemical run-off from farming and grazing. - contaminated water → reef - detrimental in a number of ways - interfering with coral development - increasing the outbreaks of coral eating starfish

2009 - Reef Water Quality Protection Plan

Long term goal:

To ensure that by 2020 the quality of water entering the reef from broadscale land use has no detrimental impact on the health and resilience of the Great Barrier Reef.

Sets water quality and land management targets required to meet the long-term goal.

Reef Water Quality Protection Plan is a collaborative program designed to improve the quality of water feeding into the Great Barrier Reef through improved land management practices in reef catchments. Joint commitment between the Australian and Queensland Governments. Long term goal: "ensure that by 2020 the quality of water entering the the reef has no detrimental impact on its health" Sets specific actions and deliverables to be completed by 2018.

Reef Report Card

Reports on progress towards those targets, using measurements taken on a (mostly) annual basis.

- Reef report cards are issued for each calendar year - tracking progress against the 2009 baseline measurements. Metrics include things like: - the % of farmers who have adopted best practice land management - % of coverage for groundcover and riparian vegetation - catchment water pollution loads - marine coral health & sea-grass abundance and - water quality out on the reef. Report cards: a scientific basis for assessing the effectiveness Contain analysis based on reports and data provided by a number of Queensland and Federal Government agencies: - Department of Natural Resources & Mines (DNRM); - Department of Agriculture, Fisheries and Forestry; - Great Barrier Reef Marine Park Authority; - Deparment of Science, Information Technology, Innovation and the Arts; & - CSIRO Reports and collected data are processed by the Reef Secretariat under DP&C.

Released so far

Baseline 2009 → Released August 2011

2nd 2010 → Released April 2013

3rd 2011 → Released July 2013

Feedback most useful when it closely follows the completion of data collection. Analysis of the raw data by relevant agencies takes considerable time, as does collation and publishing. Media coverage (and resulting public engagement) also benefits from timely delivery of new report cards. Interest: "current" health of the reef > its "historical" health 2011 report: extreme weather events heavily impacted water quality

Why automate?

  • ⏩ Produce reports faster
  • ☑ Reduce data-entry errors / improve quality assurance
  • ♻ Data sharing/re-use
  • ⌛ Long term access and preservation
So, we want to automate the process of creating reef report cards. central reason: we believe if we can reduce the number of manual, repetitive steps involved, then the reports can be produced faster. Fairly straight-forward proposition - must be careful that the system doesn't introduce more work than it saves. Another good reason to automate: - manual copying → incorrect reentry - initial prototyping: automated generation of key charts with simple form - increased the data points in the charts → increasingly no. errors Report card data into an online system → improving tracking of that data. Gives us opportunities to share and reuse data, and it also lets us improve long-term access and preservation of that data.

Our work so far

Document Management

+

Chart Generation

So, now that the background is out of the way, let's take a look at what we've done so far. Scoping study identified the need for centralised document management. Most documents supplied by contributing agencies → Microsoft Word & Excel. - multiple iterations of feedback and improvement - "version control by email". discussed earlier - identified need to automatically generate charts - graphic design work → time intensive, - contributors limited to simple Excel-based charts when producing analyses System so far - best characterised as a simple Document Management System with a number of domain-specific extensions. Scoping study identified existing software we could have extended. A problem emerged: - best simple open-source document management systems → PHP - best open-source chart tools → Java - PHP to Java bridges → probably not a good move Already had Java-based chart libraries from scoping study - brief look at Java DMS - all seemed aimed at enterprises with: - thousands of users, and - small army of developers to extend them. In the end, we decided to write own DMS - simple → functionality required - having control of the entire code-base → integrating easier

Tech we're using

+ +

... and quite a bit more:

BatikPOITikadocx4jJFreeChartCRaSH

- followed the lead of other document management systems - build on top of a Java Content Repository - Apache Jackrabbit. - Using JVM → didn't need a servlet container - opted for the Play Framework. - not quite as quick to develop with as RoR / Django - still pretty good - Play Framework is built by Typesafe - founded by the creators of the Scala programming language - mix in Scala where it made sense. - hadn't used it before, so limited - improved some collaborative features of the system.

Data Input Formats

Excel spreadsheets, not machine-friendly XML

Good news: contributors supply their data as XML! Bad news: that's because MS Office documents are zipped bundles of XML. Rather than try to change that... We worked with Reef Secretariat to specify some standard formats for supplied spreadsheets, and then set to work on extracting the data we needed from them. Apache POI makes extracting spreadsheet data fairly straight-forward - reading cell colours is a little temperamental w/ non-MS office suites - fortunately, only developers using those

Vector Charts

Data extraction, processing & graphics generation

Having extracted the data we then used a combination of Java's AWT libraries, Apache Batik and JFreeChart to produce vector-based charts. Example just uses AWT and Batik - more conventional charts use JFreeChart.

Not a workflow system

System assists the process, rather than changing it.

It's sometimes tempting to build workflow management into a system, to better "manage the process". Aren't any good reasons to do that here: - not many people involved once the raw data has been processed - email workflow already → everybody already talking to each other Set out to avoid incorporating processes whenever possible: - no fixed data hierarchy - minimal folder-based permission system Limits how much we can automated assembly → avoids need to understand as many business rules.

Chart creation

So, time for a demo. [start video] AORRA is a modern web app, using asynchronous background requests to update the user interface. In this screencast: - files being uploaded as a group to AORRA - flip through the generated charts - all the charts are downloadable as a single archive - vector formats (SVG, EMF) - raster formats (PNG) - suitable for print and web publications - demonstrate versioning: we upload the file again - delete the file

Real-time notifications

To aid collaboration, AORRA provides real-time notifications of other user activity. [start video] This screencast shows two different users logged in: - top user - navigates to a file - watch it for changes - bottom user - navigates to the same file - indicates they are editing the file by clicking "Edit" - immediately updates the related counter for both users - adds a new notification message - top user - marks the notification as read - bottom user - uploads a new version of the file - notification immediately appears - top user - reads the new notification - deletes all notifications

What's next?

  • 📊 Web-focused charts & infographics
  • 🌏 Geospatial data
  • 💬 Online editing and collaboration
So, what's next? Reef Secretariat have been focused on print-format report cards - website reusing the print assets - start targeting the web → interactive charts and infographics integrate geospatial data - drill down on a map Production side - like to reduce the reliance on Word documents

Challenges ahead

Linking report cards to raw data.

Ultimately if we're going provide the ability to drill down using geospatial data we'd like to go all the way back to the raw data. The Health-e-Waterways project does something similar already. This will be a significant challege: - Reef Secretariat doesn't receive anything close to the raw data. - source data for spreadsheets → generally aggregated already - aggregated by contributor software systems Plan: slowly work out way down.

Longer-term

Integrating with eReefs to do it with web services.

eReefs → ? ? ? ← AORRA

Longer-term we'd like AORRA to receive lower-level data from the larger eReefs project, which is working to produce web data services for this data. Realistically though this is a long-term aspiration. eReefs has a phased development out to 2016, and their APIs will have to provide up-to-date data from operational systems before we can use them. For now, AORRA is very much focused on providing tools to improve reporting in the short-term while a longer-term solution emerges.

Questions ?

@tjdett

uq-eresearch.github.io/aorra-eresearch-2013

- So, that concludes my talk. - Thanks for your time. - slides & videos - link or QR code Does anybody have any questions?