Data ScienceShow & Tell – Enrico Spinielli – Let's Do It!



Data ScienceShow & Tell – Enrico Spinielli – Let's Do It!

0 0


showandtell

(web) Slides for the Data Science Show & Tell meetup

On Github espinielli / showandtell

Data ScienceShow & Tell

Enrico Spinielli

June 9, 2016

Live slides available at https://espinielli.github.com/showandtell

PDF and source of slides available at https://github.com/espinielli/showandtell

Given:

  • life is short
  • I am lazy
  • You should not lie
  • Humans are intelligent (w/ caveats ;-)
  • ... and not all of them are working at Eurocontrol

...it follows

  • I'll (procastinate on boring stuff and only) work on useful/fun projects
  • Automation saves me from repeating boring and/or forgotten tasks
  • I'll be open to let others critisize/scrutinize/learn
  • ...and I'll learn back from them
  • I'll strive to produce truthful explanations/visualizations

Let's Do It!

The Axioms (IMHO)

  • Value of data --> visualization
  • Visualization --> WWW
  • Make data available
  • no Web: then you do not exist, i.e. EC/PRB/PRU
  • no boring stuff: enough of it, do better.
  • truthful: no evil
  • visualization: humans perception & best practices!
  • data availability!

The Plan(Jan 2015)

  • Generate a (static) website for the PRU
  • Version control it all
  • Automate!
  • static: no need of server, no authentication, no hacks!
  • version control: done by systems not humans, i.e. naming convention in folders...
  • automation: the only way to scale

Now one year and a half later

Sections

Demo time

  • Make sure to navigate from Data to Metadata (last column in the table)
  • Check the Graphs out
  • Enjoy the Studies

the official PRU site, http://ansperformance.eu

Features

&

<user>.github.io

Tech Docs

Release

Bugs

Editing

  • easy, i.e. textual (ASCII, no HTML): separate content from style
  • nice Math (via MathJax): \[f(x)=\sum_{n=0}^\infty\frac{f^{(n)}(a)}{n!}(x-a)^n\]
  • bibliography: cite and style
  • templates for different kind of pages (Definitions, list of ANSP's, RN's)

Markdown

No need to edit in HTML: we (mainly) use Markdown (from Pandoc)

## Methodology

[Horizontal en-route flight efficiency methodology](/r/m/hfe_pi.html)
is fully consistent with the Single European Sky (SES)
Performance Scheme [see {% cite pru-hfe-pi --file aviation %}].

## Column naming and types

### HFE data

{:.metatable}
| Column name | Src | Label     | Column description    | Example |
|-------------|-----|-----------|-----------------------|---------|
| YEAR        | NM  | YEAR      | Reference year        | 2014    |
| MONTH_NUM   | NM  | MONTH_NUM | Month (numeric)       | 9       |
| MONTH_MON   | NM  | MONTH_MON | Month (3-letter code) | JAN     |

Biblio

Generation

  • from DB queries to website: scripts
  • Jekyll: MD -> HTML
  • Pandoc: MD -> PDF
  • some from Rmarkdown/[knitr] in the near future

Workflows

Trigger

Travis CI

  • But we NEED MORE to scale: for example checks on data consistency

ToDo's

DB

  • new schema for production: PRUPROD
  • use current ones for development (PRUDEV) and testing (PRUTEST)
  • version control [PL]SQL code, i.e. which code was used to produc which indicators
  • version control the DB used for prod: regulatory repository

Data

  • clarify dimensions
  • improved the Meta part of it: definitions, methodology
  • add more data and (web) API (see ICAO iSTARS)
  • generate the spreadsheets if CSV files/API are not enough
  • Metadata is to be transparent and to avoid confusion, i.e. define what you name/use (delay, trajectory, FIR)
  • the API is to make the data available: remember we are not the only smart ones around

Viz

More Viz

  • more Studies/Articles w/ interactivity (see NYT, WP)
  • more thinking of what is worth plotting
  • more graphs in Graphs
  • one year old experiment click here
  • a recent one w/ STATFOR click here

Wild thoughts

  • personally I am not interested in BI or industrial-like dashboards
  • I know that little is used of our NMIR

Just mine ones

  • PRR live in the website and PDF generated from the source in git repo
  • add Jypiter notebooks to the website for case studies

Conclusions

We want you!

  • Share knowledge (or lack of)
  • Learn from and know each other
  • Discover internal and external datasets
  • critisize & propose alternatives
  • signal things you saw and would like to see implemented in our site For example NYT, Bloomberg (1, 2), WP, ProPublica, The Guardian, Financial Times ... have fantastic infographics

We hear you!

  • emails with questions, proposals are a good start
  • you are always welcome to come and chat (but bring your coffee)
  • present at the next Show & Tell

References and Inspirations

Tools

  • Google Charts cannot be run offline
  • GCharts make your life difficult if you want to load data locally, i.e. CSV instead of Google Spreadsheets

Social

Books

Yes, you still have to study!

  • Tufte, Edward
  • Cairo, Alberto
  • Few, Stephen

Credit Where it is Due

Trivia

Automation 1

xkdc 1319 and explanation

Title text: 'Automating' comes from the roots 'auto-' meaning 'self-', and 'mating', meaning 'screwing'.

Automation 2

xkdc 1205 and explanation

Title text: Don't forget the time you spend finding the chart to look up what you save. And the time spent reading this reminder about the time spent. And the time trying to figure out if either of those actually make sense. Remember, every second counts toward your life total, includingthese right now.

Correlation

xkdc 1205 and explanation

Title text: Correlation doesn't imply causation, but it does waggle its eyebrows suggestively and gesture furtively while mouthing 'look over there'.

Data Accuracy 1

Dilbert 2008-05-07

Data Accuracy 2

Dilbert 2008-05-08

Convincing

xkdc 833 and explanation

Title text: Don't forget the time you spend finding the chart to look up what you save. And the time spent reading this reminder about the time spent. And the time trying to figure out if either of those actually make sense. Remember, every second counts toward your life total, includingthese right now.

Big Data

1/45
Data ScienceShow & Tell Enrico Spinielli June 9, 2016 Live slides available at https://espinielli.github.com/showandtell PDF and source of slides available at https://github.com/espinielli/showandtell