Let's Map It! – Building Maps For The Web



Let's Map It! – Building Maps For The Web

0 1


2014-coderfaire-maps

Slides from my talk at 2014 Coderfaire: "Let's Map It!"

On Github briandailey / 2014-coderfaire-maps

Let's Map It!

Building Maps For The Web

CoderFaire 2014

Brian Dailey / @byeliad

What We'll Talk About

Geospatial data formats.

This is important because if you're going to map data, you need to know what the data might look like. You might have to work with several different data formats. Know your enemy.

Tools for transforming geospatial data.

Most data formats are not suitable for presenting on the web, so you'll want to know how to transform it into something useful.

Libraries to quickly view data.

When you're messing with geo data, it's really important to be able to quickly see what you're dealing with (checking periodically that you are not completely on the wrong trail).

Libraries to present data.

Leaflet.js and d3 being the two giants here. You want things to look nice and still be available on multiple platforms.

What We Won't Talk About

How tile servers work.

This used to be something you had to talk about, but now with Mapbox it's not something you *have* to know about.

Geographic projections (see XKCD #977)

Because we're programmers, by golly, not cartographers.

Why Should You Care?

First, why care? More and more spatial information available from govmt srcs, better and better tools with which to present it. Spatial data is best presented in map, not tabular. (~3m)

It Has Gotten Better

Source: How Simplicity Will Save GIS

This is what it used to be. this is from a talk by Vladimir Agafonkin, the creator of leaflet.

This?

Or This?

With this map you can immediately see where there are holes in the data. You can immediately see where most building permits are concentrated.

Interesting Maps Abound!

One of the strongest reasons for me, personally, is that I see all of these really cool maps and I want to know how to create them!
Making Maps with Python (Trulia) There are all kinds of companies doing interesting things with maps. Trulia can show you affordability, crime rates, and all kinds of other interesting things. In fact, there is a 3 hour Youtube video online by Zain Memom that explains how Trulia did this.
This image tells a story of racial segregation in Nashville almost the moment you see it.
This is NYC in comparison, with some racial divides but overall a much better mix.

GeographicData Formats

Shapefile

tl_2013_47_bg.dbf
tl_2013_47_bg.prj
tl_2013_47_bg.shp
tl_2013_47_bg.shp.xml
tl_2013_47_bg.shx
tl_2013_47_bg.zip
Shapefiles are more a project folder with a bunch of files than a real format, but it is one of the oldest formats still in use. It was introduced by Esri in early 1990s.

An explanation may be in order...

.shp—The main file that stores the feature geometry (required). .shx—The index file that stores the index of the feature geometry (required). .dbf—The dBASE table that stores the attribute information of features (required). There is a one-to-one relationship between geometry and attributes, which is based on record number. Attribute records in the dBASE file must be in the same order as records in the main file. .sbn and .sbx—The files that store the spatial index of the features. .fbn and .fbx—The files that store the spatial index of the features for shapefiles that are read-only. .ain and .aih—The files that store the attribute index of the active fields in a table or a theme's attribute table. .atx—An .atx file is created for each shapefile or dBASE attribute index created in ArcCatalog. ArcView 3.x attribute indexes for shapefiles and dBASE files are not used by ArcGIS. A new attribute indexing model has been developed for shapefiles and dBASE files. .ixs—Geocoding index for read/write shapefiles. .mxs—Geocoding index for read/write shapefiles (ODB format). .prj—The file that stores the coordinate system information (used by ArcGIS). .xml—Metadata for ArcGIS—Stores information about the shapefile. .cpg—An optional file that can be used to specify the code page for identifying the character set to be used.

(Source)
Listen, I know what you're thinking. That looks like a mess. Well, it is. It's great for ESRI and ArcGIS, but it's not really web-friendly.

KML

"Keyhole Markup Language"

<?xml version="1.0" encoding="UTF-8"?>
<kml xmlns="http://www.opengis.net/kml/2.2">
<Document>
<Placemark>
  <name>New York City</name>
  <description>New York City</description>
  <Point>
    <coordinates>-74.006393,40.714172,0</coordinates>
  </Point>
</Placemark>
</Document>
</kml>

"..XML notation for expressing geographic annotation and visualization..."

A zillion otherformats...

(Many of them proprietary.)

(Most of them XML extensions.)

GPX (used by Garmin), GML standard developed by OpenGeoSpatial... Both of which are XML based. In fact, most of them are XML-based.
I know, right? XML! What is, this, 2004? Are we going to start delivering maps with SOAP and WSDLs? No! There's hope!
That's right. Let the relief wash over you.

GeoJSON

{
  "type": "Feature",
  "geometry": {
    "type": "Point",
    "coordinates": [125.6, 10.1]
  },
  "properties": {
    "name": "Dinagat Islands"
  }
}
We all know and love (maybe hate?) JSON. It's certainly more lightweight than XML extensions. This is currently the most popular way to express geographic data on the web. Could change tomorrow.

GeoJSON

{
  "type": "FeatureCollection",
  "features": [
    {
      "type": "Feature",
      "geometry": {
        "type": "Multipolygon",
        "coordinates": [125.6, ( .... snip .... ) 10.1]
      },
      "properties": {
        "name": "Dinagat Islands"
      }
    },
    ...
    ]
}
This is more often what you'll see - a collection of features. Note the "coordinates"... this long list of numbers represents the blood, sweat and tears of generations of cartographers. I still view it with awe. The properties array allows all manners of freedom, which we can demonstrate later. (8:30)

New Hotness

TopoJSON

"TopoJSON eliminates redundancy, allowing related geometries to be stored efficiently in the same file." https://github.com/mbostock/topojson This is an interesting GeoJSON extension that ultimately deals with the problem of large file sizes. However, it is quite new and not supported everywhere.

Databases

Just go with PostgreSQL/PostGIS.

Different databases have varying levels of support. The current gold standard, however, is PostGIS, a PostgreSQL extension that add geometry types to your tables. SQLAlchemy has some support, MySQL's is very limited. PostGIS is where it's at. Easy to set up. It's even included in Postgres.app for Mac OS X.
Geometry Types So what did I just mean by "geometry types"? Didn't I just talk about formats? Well, this is a little more specific.
Point - a single position. MultiPoint - an array of positions. LineString - an array of positions forming a continuous line. MultiLineString - an array of arrays of positions forming several lines. Polygon - an array of arrays of positions forming a polygon (possibly with holes). MultiPolygon - a multidimensional array of positions forming multiple polygons. GeometryCollection - an array of geometry objects. Feature - a feature containing one of the above geometry objects. FeatureCollection - an array of feature objects. (Source) This vocabulary is good to know because it's fairly consistent across most data formats. Multipolygon is the big one here.

GDAL

A tool you should use

If you plan to do much with geo data, this is a library you should prob. have. Geospatial Data Abstraction Library. Includes all kinds of utilities to make your life easier. (brew install gdal. just not NOW, fools!)

Project time!

Let's build some maps!

The Data - Building Permits

Nashville has pushed a lot of data to data.nashville.gov, this is a data set that (like many others) has long/lat appended to the addresses.

The Question

Which parts of Nashville have filed the most housing permits?

The Method

Look at permits per blockgroup.

Blockgroups are a geography provided by the US Census that breaks up a population into (generally speaking) equitable chunks. Usually between 600 and 3000 people.

First Step

Where can we get block groups?

US Census

TIGER/Line Shapefiles and TIGER/Line Files

The US Census Bureau provides shapefiles of all levels of geography.
Unfortunately, if you go to the site, you'll note that block groups are only available on the state level.
Furthermore, they are in cryptically named zip files in an FTP directory. The numbers are FIPS (Federal Information Processing Standards) codes.
Thanks, bureaucracy, for adding needless layers of complexity!
If you take one look at that directory and run away, I wouldn't blame you. But FIPS codes start to make a little more sense when you get familiar with them. They apply to counties, blockgroups, etc.

TN = 47

Already done the work for you, TN is FIPS 47.

Let's grab that file. (Just not now. Be kind to the WiFi, yo.)

wget ftp://ftp2.census.gov/geo/tiger/TIGER2013/BG/tl_2013_47_bg.zip

Unzipped; yep, looks like a shapefile.

$ unzip tl_2013_47_bg.zip
$ ll
-rwxrwxr-x  1 brian  staff   383K Aug  2  2013 tl_2013_47_bg.dbf
-rwxrwxr-x  1 brian  staff   165B Aug  2  2013 tl_2013_47_bg.prj
-rwxrwxr-x  1 brian  staff    36M Aug  2  2013 tl_2013_47_bg.shp
-rwxrwxr-x  1 brian  staff    19K Aug  2  2013 tl_2013_47_bg.shp.xml
-rwxrwxr-x  1 brian  staff    32K Aug  2  2013 tl_2013_47_bg.shx

Convert this thing to GeoJSON!

$ ogr2ogr -t_srs "EPSG:4326" -f "GeoJSON" tennessee.geojson tl_2013_47_bg.shp

Wait, what?

ogr2ogr

converts simple features data between file formats

ogr2ogr is part of the GDAL package. It's like the utility knife for Geographic data. I know this looks gnarly, so let's brake it down.
ogr2ogr -t_srs "EPSG:4326"

http://www.bostongis.com/?content_name=srid#82

Reproject/transform to this SRS on output. SRS is Spatial Reference System Identifier. There are a lot of different ways to project maps, and this particular system is one of the more common, US-Centric ones. We won't spend much time on this, it's an abyss. Choice of projection will be based on where and how large the map will be.
-f "GeoJSON" tennessee.geojson tl_2013_47_bg.shp
Basically, direct output to a file in the GeoJSON format. Source is a shapefile, which is the default. Note we pointed to .shp but other files ARE required.

Guesses as to file size?

Do you think the GeoJSON will be larger or smaller than the shapefile?
$ ls -l
-rw-r--r--  1 brian  staff    62M Sep  4 20:28 tennessee.geojson
Well, at least it's not XML. Right? ... right?
iPython 3.4.0 (default, Apr 15 2014, 20:38:59)
[GCC 4.2.1 Compatible Apple LLVM 5.1 (clang-503.0.38)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import json
>>> f = open('tennessee.geojson')
>>> j = json.loads(f.read())
Since GEOJSON is a subset of JSON, we can just use our favorite programming language to trim this data down to the parts we want - namely, Davidson County.
>>> j.keys()
dict_keys(['features', 'type', 'crs'])
>>> j['type']
>>> 'FeatureCollection'
>>> len(j['features'])
4125
        
In this generated file, we have a feature collection. There are over 4000 features (in this case, block groups).
# note that the properties value stores
# useful information like county FIPS (COUNTYFP)
>>> j['features'][0]['properties']
{'NAMELSAD': 'Block Group 3', 'BLKGRPCE': '3', 'STATEFP': '47',
'INTPTLAT': '+36.0563099', 'INTPTLON': '-083.8449238', 'COUNTYFP': '093',
'AWATER': 0.0, 'TRACTCE': '005201', 'GEOID': '470930052013',
'MTFCC': 'G5030', 'ALAND': 14162071.0, 'FUNCSTAT': 'S'} 
>>> # Davidson County is 037.
# number of davidson co block groups
>>> len([feature for feature in j['features'] \
    if feature['properties']['COUNTYFP'] == '037'])
473
>>> davidson_county_blocks = [feature for feature in j['features'] \
    if feature['properties']['COUNTYFP'] == '037']
>>> j['features'] = davidson_county_blocks
>>> dc = open('davidson_county_blocks.geojson', 'w')
>>> dc.write(json.dumps(j))
2259120
>>> dc.close()
-rw-r--r--  1 brian  staff   2.2M Sep  4 20:59 davidson_county_blocks.geojson
So it's still 2.2 MB. Now we could do things like smooth out the lines, in each multipolygon, but that is beyond the scope of this talk. It makes your maps more jagged.

Great!

Now how do I see this thing?

Folium

https://github.com/wrobstory/folium

"Python Data. Leaflet.js Maps."

If you're a Python guy, instead of a JS guy, this is an easy way to generate a map.
import folium

geo_path = r'data/davidson_county_blocks.geojson'

bna = folium.Map(location=[36.1881350, -86.7207050],
                           tiles='Mapbox Bright', zoom_start=10)
bna.geo_json(geo_path=geo_path)
bna.create_map(path='bna.html')
The location is the long/lat for Nashville (near where I live).
Woohoo! Now this just shows us what the block groups look like. We still want to color them based on the number of permits. That's called a "choropleth" map.

Need to find out number of permits per blockgroup.

Thankfully, the data has longitude/latitude appended to the address, so we don't have to get into geocoding (that's a whole 'nother talk).
for each blockgroup
    count number of points in block group
Ultimately, what we want to do is loop over the block groups and count the number of permits that lie within that block group.

PostgreSQL/PostGIS, Here We Come!

Using Postgress.App on Mac OS X

$ psql -h localhost
psql (9.3.5)
Type "help" for help.

brian=# create database buildings;
CREATE DATABASE
brian=# \c buildings
You are now connected to database "buildings" as user "brian".
buildings=# create extension postgis;
CREATE EXTENSION
buildings=# \q
Now that wasn't too hard, was it? PostGIS is not enabled by default. It used to be a lot harder to turn PostGIS on, but it's quite easy these days.
$ shp2pgsql -s 4326 tl_2013_47_bg blockgroups | psql -d buildings
$ psql -d buildings
buildings=# select count(*) from blockgroups;
count
-------
4125
(1 row)
This command line tool is included with PostGIS. It takes the shapefile and loads all of them into the blockgroups table, which is geospatially enabled. The -s transformed it to a very commonly used projection, you'll be seeing more of that later.

That wasn't too bad.

Loading up Shapefiles into PostGIS is pretty easy.

Unfortunately, the government data is a bit messier.

CREATE TABLE building_permits (
    permit INTEGER,
    permit_type_description VARCHAR(40),
    parcel VARCHAR(18),
    date_entered DATE,
    date_issued DATE,
    const_cost VARCHAR(12),
    subdivisionlot VARCHAR(100),
    contact VARCHAR(60),
    per_ty VARCHAR(4),
    ivr_trk INTEGER,
    purpose TEXT,
    address VARCHAR(31),
    city VARCHAR(14),
    state VARCHAR(2),
    zip INTEGER,
    mapped_location VARCHAR(92),
    longlat  geometry(Point, 4326)
);
Had to eyeball the file to create a decent SQL table container.

Load that crazy CSV file...

buildings=# \copy building_permits (permit, permit_type_description, [...]
    zip, mapped_location) from 'building_permits_20140828.csv' with csv header;
buildings=# select count(*) from building_permits;
 count
-------
 14678
(1 row)

Note that longlat is still NULL.

The longlat was embedded in the address. Bureaucrats, right? What're ya gonna do.

Those that have sensitive stomachs may want to look away.

update building_permits set pt =  st_pointfromtext('point(' ||
    replace(substring(mapped_location from '[\-0-9\.]+\)'), ')', '') ||
    ' ' || 
    substring(mapped_location from '[0-9]{2}\.[0-9]{10}') || ')', 4326);
            

Basically, longlat = POINT(X, Y, SRID)

Yes, I threw regex at it. Now I have... more problems.

Again, the 4326 is the same SRID you saw for the ogr2ogr tool earlier.

buildings=# select geoid, count(1) as total_permits from blockgroups bg,
    building_permits bp where st_contains(bg.geom, bp.longlat)
    group by geoid order by total_permits desc limit 5;
    geoid     | total_permits
--------------+---------------
 470370156232 |           272
 470370195002 |           267
 470370191141 |           266
 470370121001 |           260
 470370195001 |           252
(5 rows)
This tells us that some blockgroups have a lot of permits, but not much else.

Let's map it with Folium again...

Folium allows you to build choropleth maps very easily by combining a CSV file and a GeoJSON file.

Create CSV

#buildings=# \copy (select...) to 'data/total_permits_by_geo.csv' with csv;
import folium
import pandas as pd

geo_path = r'davidson_county_blocks.geojson'
building_permits = pd.read_csv(r'total_permits_by_geo.csv')

bna = folium.Map(location=[36.1881350, -86.7207050],
    tiles="Mapbox Bright", zoom_start=10, width=640)

bna.geo_json(geo_path=geo_path,
    data=building_permits,
    columns=['GEOID', 'Permits'],
    key_on='feature.properties.GEOID',
    threshold_scale=[25, 50, 100, 150, 200],
    fill_color='YlGn', fill_opacity=0.7, line_opacity=0.2,
    legend_name='Building Permits')
bna.create_map(path='bna.html')
Note that the black pages lack any data at all. Lots of contruction in interesting places, but as this map is not clickable (beyond zoom, drag, etc) it's hard to tell how many permits are in each blockgroup.

Time to step away from the folium and go raw Leaflet.js...

Let's say that we wanted to see additional information when hovering over each blockgroup. Leaflet has a great example on their website.

A Simple Map

<!DOCTYPE html>
<head>
   <link rel="stylesheet" href="http://cdn.leafletjs.com/leaflet-0.5/leaflet.css" />
   <script src="http://cdn.leafletjs.com/leaflet-0.5/leaflet.js"></script>
</head>
<body>
        <div id="map" style="width: 960px; height: 500px"></div>
<script>
var map = L.map('map').setView([37.8, -96], 4);

L.tileLayer('http://{s}.tile.stamen.com/toner/{z}/{x}/{y}.jpg', {
    maxZoom: 18
}).addTo(map);</script>
</body>
</html>

We need to pull in our GeoJSON, can use queue and d3...

queue()
      .defer(d3.json, 'data/building_permits.geojson')
      .await(makeMap);

function makeMap(error, geoJSON) { }

http://giscollective.org/d3-queue-js/

var map = L.map('map').setView([36.188135, -86.720705], 10);

davidsonCo = L.geoJson(geoJSON).addTo(map)
davidsonCo = L.geoJson(geoJSON,
    {style: style, onEachFeature: onEachFeature }).addTo(map)
just by adding some additional configuration options we can create methods that add stylistic touches and popups.

The Choropleth

function style(feature) {
    return { fillColor: getColor(feature.properties.building_permits), opacity: 1, color: 'black', fillOpacity: 0.9, weight: 0 };
}

function getColor(d) {
    return d > 250  ? '#800026' :
           d > 200  ? '#BD0026' :
           d > 150  ? '#E31A1C' :
           d > 100  ? '#FC4E2A' :
           d > 50   ? '#FD8D3C' :
           d > 20   ? '#FEB24C' :
           d > 10   ? '#FED976' :
                      '#FFEDA0';
    }
}
Note that I'm keying off the building prop data in each feature properties. Simply added this by munging about the JSON in Python.

The Pop-up

function onEachFeature(feature, layer) {
    if (feature.properties 
        && feature.properties.GEOID 
        && feature.properties.building_permits) {
        layer.bindPopup(feature.properties.GEOID + " has "
            + feature.properties.building_permits + " building permits.");
    }
}

We can also dispense with the tiles.

So far, we've been showing this map with Leaflet. That's great, but sometimes you want to show a simple map with no tile set, like this one.

d3.geo

github.com/mbostock/d3/wiki/Geo

d3 makes this really easy. It simply draws geographic features as SVG or Canvas. Note that I don't have a whole lot of expertise here, so if you already know how to do this feel free to point your finger at me and laugh. Let's walk through creating this map with d3.
var width  = 500;
var height = 500;

var vis = d3.select("#vis").append("svg")
    .attr("width", width).attr("height", height) 
We created a div with the id of "vis", and created an svg container inside it matching our specified height.
d3.json("data/building_permits.geojson", function(json) {
...
});
Rather than embedding a geojson document inside our HTML, we'll load it with d3.
var center = d3.geo.centroid(json)
var scale  = 50000;
var offset = [width/2, height/2];
var projection = d3.geo.mercator().scale(scale).center(center)
    .translate(offset);

Helpful tips: www.maori.geek.nz/post/d3_js_geo_fun

We'll center the map on our geojson collection (saves much headache!). You can get more clever about setting scale, but that's beyond the scope here. The projection is set using mercator (which we won't get into, but default is albersUSA), centered on the multipolygon centroid (what's that) translate offset "determines the pixel coordinates of the projection’s center."
var path = d3.geo.path().projection(projection);
Creates a new geographic path generator.
vis.selectAll("path").data(json.features).enter().append("path")
    .attr("d", path)
    .style("fill", "#a6bddb")
    .style("stroke-width", "0.3")
    .style("stroke", "white")
The magic happens here - SVG path is created from the json.features (inside the feature collection) with the specified style.
Here's the GEOJSON drawn to an SVG with our specified style. Pretty simple.

How about some choropleth?

var color = d3.scale.quantize()
    .range([
    "rgb(254,229,217)",
    "rgb(252,174,145)",
    "rgb(251,106,74)",
    "rgb(222,45,38)",
    "rgb(165,15,21)"
    ]).domain([0, 300]);
We can use the d3 quantize scale to create a nice color scale to use. I manually set the min and max for convenience. If you haven't noticed yet, d3 is pretty friggin' awesome.
function choroplethColor(feature) {
        return color(feature.properties.building_permits);
    }
Next we create a simple func that accepts a feature and returns a color based on the number of building permits.
vis.selectAll("path").data(json.features).enter().append("path")
    .attr("d", path)
    .style("fill", choroplethColor)
    .style("stroke-width", "0.3")
    .style("stroke", "white")
Last, we simply pass the function into the style.
dawww yeah.

D3 mouseover events

d3 can easily add mouseover/mouseout events to the path of each feature. you have to set the style with css.
.on("mouseover", function(feature) {
    d3.select(this).transition().duration(300).style("opacity", 1);
    div.transition().duration(300)
    .style("opacity", 1)
    div.text(feature.properties.building_permits + " building permits")
    .style("left", (d3.event.pageX) + "px")
    .style("top", (d3.event.pageY -30) + "px");
})
.on("mouseout", function() {
    d3.select(this)
    .transition().duration(300)
    .style("opacity", 0.8);
    div.transition().duration(300)
    .style("opacity", 0);
})
Simply setting opacity to 1 after mouse over, and positioning the hover div over the event.
div.tooltip {
border: 1px solid #ccc;
font-family: sans-serif;
font-size: 12px;
color: #333; width: auto;
padding: 10px;
 position: absolute;
 pointer-events: none;
background-color: #fff;
}

But wait, there's more!

$ ls -lc
-rw-r--r--  1 brian  staff   2.2M Sep 20 20:44 building_permits.geojson

2.2MB?!

That's pretty large, and it's certainly not something you want to eat up all your bandwidth serving. Remember that earlier we mentioned simplifying geometries?

Many ways to simplify!

  • Douglas-Peucker
  • Visvalingam / effective area
  • Visvalingam / weighted area

Three mapshaper.org to get you started.

This online tool allows drag & drop simplification of Geospatial data.

ogr2ogr to the rescue!

-simplify (tolerance)
But in our case, we are going to assume you don't know about any of these, you just want to make that file smaller.

Don't get carried away!

ogr2ogr -f GeoJSON simplified.geojson -simplify 0.01 building_permits.geojson
ogr2ogr -f GeoJSON simplified.geojson -simplify 0.001 building_permits.geojson
Much better!
$ ls -lc
-rw-r--r--  1 brian  staff   303K Sep 20 20:43 building_permits.geojson

2.2MB > 303k!

303k is still large, and there is more we could do to tune this, but it gives you an idea of the reduction.

One last thing!

Wrapping Up

Slides: dailytechnology.net/2014-coderfaire-maps

It's been a whirlwind, but we've touched on many of the things you need to know to draw maps on the web. I'll be releasing this as a mini-site later and pushing the code to github. Slides are also going online.

Geodata is still just Data.

The tools have gotten really good.

You should be building maps.

Resources

Thanks!

@byeliad / dailytechnology.net

Slides: dailytechnology.net/2014-coderfaire-maps

Source: github.com/briandailey/2014-coderfaire-maps

Special thanks to Lance Roggendorff for reviewing.