Machine Learning Evaluation with LeVar – Elias Ponvert – Hello Datapalooza!



Machine Learning Evaluation with LeVar – Elias Ponvert – Hello Datapalooza!

0 0


ML-Eval-With-LeVar-Datapalooza


On Github eponvert / ML-Eval-With-LeVar-Datapalooza

Machine Learning Evaluation with LeVar

A database to make ML evaluation sane

Elias Ponvert

director of data science at

Hello Datapalooza!

What this talk is about

  • Machine learning evaluation is a pain, and in my experience we're all terrible at it
  • Here's what we should do instead
  • Say hello to LeVar, a database designed to help us do what we should do

Some more about LeVar

  • Demo!
  • A bit about how it's implemented, if you're interested
  • Open issues and next steps

Hello anybody from Austin Data Meetup!

Have I seen this talk before?

Yes, you have

Have you gotten 0.2 out yet?

No, I have not

Why not?

New baby. See the appendix.

Machine learning evaluation is a pain and we're all terrible at it

Quick show of hands, does anybody disagree with this statement?

Evaluation is important

  • Evaluate early, evaluate often
  • Keep a lab notebook
  • Evaluate every change
  • Do error analysis
  • Change your model based on evidence

But what happens

  • No standard source, storage or format for eval
  • End up rewriting our evaluation scripts for each project
  • Couple our evaluation code with our core ML code
  • Error analysis is a pain
  • Hard to track results on the same evaluation over time
  • Harder to to comparative error analysis over time

Our solution

We can do better

  • Keep evaluation datasets in a centralized location
  • Evaluations are largely immutable
  • Not tied to any one ML framework (e.g. scikit)
  • ...or big data framework (e.g. RDDs)
  • Totally agnostic WRT to method
  • Human-readable datasets
  • Use simple and common format for data exchange
  • Open schema for data points, i.e. arbitrary features

We can do better

  • Command-line tool for data import & power users
  • Several standard problem schema supported
    • Classification
    • Regression
    • Geo-prediction
    • Structure prediction
    • Machine translation
  • Several standard useful evaluation criteria supported
    • Accuracy
    • Precision/recall/F-score
    • ROC
    • RMSE...

We can do better

  • Web UI showing high-level experiment results suitable for bosses or clients
  • Web UI & CLI search for error analysis
  • User can comment on anything:
    • Dataset
    • Experiment
    • Item in dataset
    • Individual prediction in experiment
    • Another comment
  • User can label anything

We can do better

  • Sensible information architecture
    • Organize datasets into groups or organizations
  • Provide sensible baseline evaluations out-of-the-box
    • Most-common class
    • Mean value
    • (Weighted) random

We can do better

  • REST API to use with your favorite framework
  • Straightforward client libraries
  • Export and import to other formats (yo RDD)

Introducing LeVar

It does several of those things!

But don't freak it's just v0.1

Here's what's done now

  • CLI (in Scala)
  • Import/export TSV
  • Organizations data model
  • API for everything, using basic auth
  • Client code for Scala
  • Classification & regression datasets
  • Many useful evaluations

LeVar data model

  • Datasets
  • Items (datum)
  • Experiments
  • Predictions

Everything has a creation date; everything has a UUID

The data model should support most of those other features, so I can already do e.g. error analysis by dropping into psql and writing a query

On Github

https://github.com/peoplepattern/LeVar

Demo?

Future directions

Let's just jump over to the issues page

https://github.com/peoplepattern/LeVar/issues

THANKS!

This talk is really only designed to last 30 minutes

We probably have some time left over. Want to look at some baby pics?

Machine Learning Evaluation with LeVar A database to make ML evaluation sane Elias Ponvert director of data science at