I've got some data. – Now what?



I've got some data. – Now what?

0 0


data-talk


On Github chagan / data-talk

I've got some data.

Now what?

Slides at chagan.github.io/data-talk

You may think your data look like this

But really, it's this

What does that mean

  • Come with questions
  • Know their bias
    • Who collected this? For what?
    • How sure are they?
  • Can't rely only on the data

More likely, your data are this

Quick bath

  • Take a min, max, sum and average
  • Sort and scan
  • Missing values
  • Change things around
    • Text to columns
    • Convert to numbers, dates, text as needed
    • Pivot tables

Data smells

  • Talk to the people who collect the data
    • Get the documents behind the data
  • Check previous years
  • Row numbers
    • Excel row limits
    • Round numbers
  • Sample size
  • Null Island

Make your life easier

  • Create a data dictionary
  • Make a copy of the original
  • Don't make changes in a cell, create a new column
  • Track your changes

Visualization

Before you start

Types of charts

  • Why is this not a bar chart?
  • Comparisons
    • Bar charts, line graphs, slope graphs
  • General trends
    • Area (bubbles, pies), shades

Design

  • Clarify, not simplify
  • Limit colors, fonts
  • Interactivity
    • Overview first, zoom and filter, then details-on-demand
    • What is the purpose (what will your users to get out of it?)

Tools

When is a map a map

  • The geography is key to the story
    • The interesting trends are tied to the geography
  • That story is clear in the presentation

Tools

Resources

Lab

  • Download Chicago City salaries dataset as csv (Comma Separated Values)
  • Load the file into Google Sheets
  • Who's the highest/lowest paid employee? Anything intersting about that?
  • Use a pivot table to find the department spending the most on salaries. Chart the top 10 in your tool of choice.
  • Find one other interesting thing that might be a story. Post that and your chart to this week's slack.
  • Stuck? Refer back to "How to 'interview' a big pile of data" on the prevous slide.