Building better metrics for news



Building better metrics for news

0 0


medill-2014

slides for a guest lecture at medill

On Github abelsonlive / medill-2014

Building better metrics for news

Brian Abelson | @brianabelson Data Scientist, Enigma | Fellow, Tow Center Former Knight-Mozilla OpenNews Fellow, The New York Times Slides: brianabelson.com/scpr-2013

James Watt / Steam Engine

Horsepower

Horsepower now

What metrics are for and the effects they have.

  • Metrics are for communicating complex concepts in interpretable, actionable terms
  • As a given metric is codified, it comes to actively shape and define its context, often being manipulated in ways that lead to unforseen outcomes

What does this mean for News?

  • Pageviews:
  • Closely related to circulation size
  • Widely implemented and easy to measure
  • Of course, there have been many consequences:

Pageviews are dead"

Remind you of anything?

So how do we construct an alternative?

  • A good metric should:
  • Reflect an organization's well-defined goal(s).
  • Be widely applicable.
  • Provide actionable intellgence while remaining interpretable.
  • Minimize externalities.
  • Also, read: stdout.be/68

Pageviews Above Replacement (un-juking the stats)

  • What if we could control for promotion when judging performance?
  • From July - August, I collected data on the promotion and performance of over 21,000 articles published on nytimes.com

Data sources

  • Promotional Data:
  • ~ 200 NYT-related Twitter accounts
  • ~ 20 NYT-related Facebook accounts
  • ~ 20 section fronts
  • One homepage
  • One paper Metadata:
  • Article type: (video, slideshow, interactive, article, blogpost)
  • Section: (US, World, Art, etc...) Performance Data:
  • Pageviews and Social Media Activity for each article

Predicting pageviews

  • Sum all the pageviews for 7 days on the site
  • Use promotional features and article metadata to predict this number
  • Random Forests (the mode of a bunch of decision trees)

Variable importance

  • Time on all section fronts
  • Number of unique section fronts
  • Was the article in the paper?
  • Number of NYT-Twitter followers reached
  • Time on homepage
  • Number of NYT-tweets
  • Is the article from Reuters?
  • Is the article from the AP?
  • Max rank on homepage
  • Word count

So what?

  • Placing promotional data alongside pageviews gives us a better understanding of what the metric actually means.
  • (NYT) Pageviews are actually fairly predictable (90% of the variance explained in my model)
  • Incorporating this approach in your Newsroom should be fairly painless with particle. However, you should first ask yourself what you're optimizing for.

Other Approaches - FAST

Other Approaches - Engaged Time

Other Approaches - NSA

Other Approaches - Measuring Impact

Thanks!

@brianabelson brianabelson.com OpenNews Enigma