Machine Learning Evaluation with LeVar

A database to make ML evaluation sane

Elias Ponvert

director of data science at

Hello Datapalooza!

What this talk is about

Machine learning evaluation is a pain, and in my experience we're all terrible at it
Here's what we should do instead
Say hello to LeVar, a database designed to help us do what we should do

Some more about LeVar

Demo!
A bit about how it's implemented, if you're interested
Open issues and next steps

Hello anybody from Austin Data Meetup!

Have I seen this talk before?

Yes, you have

Have you gotten 0.2 out yet?

No, I have not

Why not?

New baby. See the appendix.

Machine learning evaluation is a pain and we're all terrible at it

Quick show of hands, does anybody disagree with this statement?

Evaluation is important

Evaluate early, evaluate often
Keep a lab notebook
Evaluate every change
Do error analysis
Change your model based on evidence

But what happens

No standard source, storage or format for eval
End up rewriting our evaluation scripts for each project
Couple our evaluation code with our core ML code
Error analysis is a pain
Hard to track results on the same evaluation over time
Harder to to comparative error analysis over time

Our solution

We can do better

Keep evaluation datasets in a centralized location
Evaluations are largely immutable
Not tied to any one ML framework (e.g. scikit)
...or big data framework (e.g. RDDs)
Totally agnostic WRT to method
Human-readable datasets
Use simple and common format for data exchange
Open schema for data points, i.e. arbitrary features

We can do better

Command-line tool for data import & power users
Several standard problem schema supported
- Classification
- Regression
- Geo-prediction
- Structure prediction
- Machine translation
Several standard useful evaluation criteria supported
- Accuracy
- Precision/recall/F-score
- ROC
- RMSE...

We can do better

Web UI showing high-level experiment results suitable for bosses or clients
Web UI & CLI search for error analysis
User can comment on anything:
- Dataset
- Experiment
- Item in dataset
- Individual prediction in experiment
- Another comment
User can label anything

We can do better

Sensible information architecture
- Organize datasets into groups or organizations
Provide sensible baseline evaluations out-of-the-box
- Most-common class
- Mean value
- (Weighted) random

We can do better

REST API to use with your favorite framework
Straightforward client libraries
Export and import to other formats (yo RDD)

Introducing LeVar

It does several of those things!

But don't freak it's just v0.1

Here's what's done now

CLI (in Scala)
Import/export TSV
Organizations data model
API for everything, using basic auth
Client code for Scala
Classification & regression datasets
Many useful evaluations

LeVar data model

Datasets
Items (datum)
Experiments
Predictions

Everything has a creation date; everything has a UUID

The data model should support most of those other features, so I can already do e.g. error analysis by dropping into psql and writing a query

On Github

https://github.com/peoplepattern/LeVar

Demo?

Future directions

Let's just jump over to the issues page

https://github.com/peoplepattern/LeVar/issues

THANKS!

This talk is really only designed to last 30 minutes

We probably have some time left over. Want to look at some baby pics?

Machine Learning Evaluation with LeVar A database to make ML evaluation sane Elias Ponvert director of data science at

Machine Learning Evaluation with LeVar – Elias Ponvert – Hello Datapalooza!

eponvert

Machine Learning Evaluation with LeVar – Elias Ponvert – Hello Datapalooza!

0 0

ML-Eval-With-LeVar-Datapalooza

Machine Learning Evaluation with LeVar

Elias Ponvert

Hello Datapalooza!

What this talk is about

Some more about LeVar

Hello anybody from Austin Data Meetup!

Machine learning evaluation is a pain and we're all terrible at it

Evaluation is important

But what happens

Our solution

We can do better

We can do better

We can do better

We can do better

We can do better

Introducing LeVar

Here's what's done now

LeVar data model

On Github

Demo?

Future directions

THANKS!

This talk is really only designed to last 30 minutes

We probably have some time left over. Want to look at some baby pics?

Machine Learning Evaluation with LeVar – Elias Ponvert – Hello Datapalooza!

eponvert

Machine Learning Evaluation with LeVar – Elias Ponvert – Hello Datapalooza!

0 0 (function() { var po = document.createElement('script'); po.type = 'text/javascript'; po.async = true; po.src = 'https://apis.google.com/js/platform.js'; var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(po, s); })();

ML-Eval-With-LeVar-Datapalooza

Machine Learning Evaluation with LeVar

Elias Ponvert

Hello Datapalooza!

What this talk is about

Some more about LeVar

Hello anybody from Austin Data Meetup!

Machine learning evaluation is a pain and we're all terrible at it

Evaluation is important

But what happens

Our solution

We can do better

We can do better

We can do better

We can do better

We can do better

Introducing LeVar

Here's what's done now

LeVar data model

On Github

Demo?

Future directions

THANKS!

This talk is really only designed to last 30 minutes

We probably have some time left over. Want to look at some baby pics?

0 0