Data Packages: put it in a box



Data Packages: put it in a box

1 2


put-it-in-a-box

A talk about Data Packages from csv,conf 2014

On Github nickstenning / put-it-in-a-box

Data Packages: put it in a box

A brief history of container shipping and an introduction to the Data Package.

Nick Stenning (@nickstenning).

We are living in the era of

Break-bulk data

The impact of containerisation

year,  vessel,             loadingCostPerTon
1956,  SS Ideal-X,         0.16
1956,  medium break-bulk,  5.86
          
year,  tonsPerManHour,     hoursInPort
1959,  0.627,              504
1976,  4234,               18
          
Simplicity Opacity Moderated diversity

The Data Package

datapackage.json

{
  "name": "gold-prices"
}
          

datapackage.json

{
  "name": "gold-prices",
  "title": "Gold Prices (Monthly in USD)",
  "license": "odc-pddl",
  "sources": [{
    "name": "Bundesbank statistics",
    "web": "http://www.bundesbank.de/Navigation/[...]"
  }],
  "resources": [{
    "path": "data.csv",
    "format": "csv",
    "schema": {
      "fields": [
        {"type": "date", "name": "date"},
        {"type": "number", "name": "price"}
      ]
    }
  }],
  "version": "1.0.42"
}
          

datapackage.json

{
  "name": "gold-prices",                   # Identifier ([a-z0-9._-]+)
  "title": "Gold Prices (Monthly in USD)", # One-sentence description
  "license": "odc-pddl",                   # Open License identifier
  "sources": [...],                        # Array of sources (defined)
  "resources": [...],                      # Array of resources (part-defined)
  "version": "1.0.42"                      # Semver-compatible version
}
          

data.csv

date,price
1950-01-01,34.730
1950-02-01,34.730
...
            

Data Package

Tabular Data Package

Tabular Data Package

  • ≥1 data file (AKA "resource")
  • resource format == CSV
  • resource schema (JSON Table Schema)

Tools

http://dataprotocols.org/

https://github.com/datasets/

Just put it in a box already.

Image credits

Rights

This presentation is released under the terms of the Creative Commons Attribution License (International, 4.0).