On Github briandailey / 2014-coderfaire-maps
Tools for transforming geospatial data.
Most data formats are not suitable for presenting on the web, so you'll want to know how to transform it into something useful.Libraries to quickly view data.
When you're messing with geo data, it's really important to be able to quickly see what you're dealing with (checking periodically that you are not completely on the wrong trail).Libraries to present data.
Leaflet.js and d3 being the two giants here. You want things to look nice and still be available on multiple platforms.How tile servers work.
This used to be something you had to talk about, but now with Mapbox it's not something you *have* to know about.Geographic projections (see XKCD #977)
tl_2013_47_bg.dbf tl_2013_47_bg.prj tl_2013_47_bg.shp tl_2013_47_bg.shp.xml tl_2013_47_bg.shx tl_2013_47_bg.zipShapefiles are more a project folder with a bunch of files than a real format, but it is one of the oldest formats still in use. It was introduced by Esri in early 1990s.
.shp—The main file that stores the feature geometry (required). .shx—The index file that stores the index of the feature geometry (required). .dbf—The dBASE table that stores the attribute information of features (required). There is a one-to-one relationship between geometry and attributes, which is based on record number. Attribute records in the dBASE file must be in the same order as records in the main file. .sbn and .sbx—The files that store the spatial index of the features. .fbn and .fbx—The files that store the spatial index of the features for shapefiles that are read-only. .ain and .aih—The files that store the attribute index of the active fields in a table or a theme's attribute table. .atx—An .atx file is created for each shapefile or dBASE attribute index created in ArcCatalog. ArcView 3.x attribute indexes for shapefiles and dBASE files are not used by ArcGIS. A new attribute indexing model has been developed for shapefiles and dBASE files. .ixs—Geocoding index for read/write shapefiles. .mxs—Geocoding index for read/write shapefiles (ODB format). .prj—The file that stores the coordinate system information (used by ArcGIS). .xml—Metadata for ArcGIS—Stores information about the shapefile. .cpg—An optional file that can be used to specify the code page for identifying the character set to be used.
(Source)"Keyhole Markup Language"
<?xml version="1.0" encoding="UTF-8"?> <kml xmlns="http://www.opengis.net/kml/2.2"> <Document> <Placemark> <name>New York City</name> <description>New York City</description> <Point> <coordinates>-74.006393,40.714172,0</coordinates> </Point> </Placemark> </Document> </kml>
"..XML notation for expressing geographic annotation and visualization..."
{ "type": "Feature", "geometry": { "type": "Point", "coordinates": [125.6, 10.1] }, "properties": { "name": "Dinagat Islands" } }We all know and love (maybe hate?) JSON. It's certainly more lightweight than XML extensions. This is currently the most popular way to express geographic data on the web. Could change tomorrow.
{ "type": "FeatureCollection", "features": [ { "type": "Feature", "geometry": { "type": "Multipolygon", "coordinates": [125.6, ( .... snip .... ) 10.1] }, "properties": { "name": "Dinagat Islands" } }, ... ] }This is more often what you'll see - a collection of features. Note the "coordinates"... this long list of numbers represents the blood, sweat and tears of generations of cartographers. I still view it with awe. The properties array allows all manners of freedom, which we can demonstrate later. (8:30)
Just go with PostgreSQL/PostGIS.
Different databases have varying levels of support. The current gold standard, however, is PostGIS, a PostgreSQL extension that add geometry types to your tables. SQLAlchemy has some support, MySQL's is very limited. PostGIS is where it's at. Easy to set up. It's even included in Postgres.app for Mac OS X.Let's build some maps!
Which parts of Nashville have filed the most housing permits?
Look at permits per blockgroup.
Blockgroups are a geography provided by the US Census that breaks up a population into (generally speaking) equitable chunks. Usually between 600 and 3000 people.Where can we get block groups?
Let's grab that file. (Just not now. Be kind to the WiFi, yo.)
wget ftp://ftp2.census.gov/geo/tiger/TIGER2013/BG/tl_2013_47_bg.zip
Unzipped; yep, looks like a shapefile.
$ unzip tl_2013_47_bg.zip $ ll -rwxrwxr-x 1 brian staff 383K Aug 2 2013 tl_2013_47_bg.dbf -rwxrwxr-x 1 brian staff 165B Aug 2 2013 tl_2013_47_bg.prj -rwxrwxr-x 1 brian staff 36M Aug 2 2013 tl_2013_47_bg.shp -rwxrwxr-x 1 brian staff 19K Aug 2 2013 tl_2013_47_bg.shp.xml -rwxrwxr-x 1 brian staff 32K Aug 2 2013 tl_2013_47_bg.shx
Convert this thing to GeoJSON!
$ ogr2ogr -t_srs "EPSG:4326" -f "GeoJSON" tennessee.geojson tl_2013_47_bg.shp
converts simple features data between file formats
ogr2ogr is part of the GDAL package. It's like the utility knife for Geographic data. I know this looks gnarly, so let's brake it down.ogr2ogr -t_srs "EPSG:4326"Reproject/transform to this SRS on output. SRS is Spatial Reference System Identifier. There are a lot of different ways to project maps, and this particular system is one of the more common, US-Centric ones. We won't spend much time on this, it's an abyss. Choice of projection will be based on where and how large the map will be.
-f "GeoJSON" tennessee.geojson tl_2013_47_bg.shpBasically, direct output to a file in the GeoJSON format. Source is a shapefile, which is the default. Note we pointed to .shp but other files ARE required.
$ ls -l -rw-r--r-- 1 brian staff 62M Sep 4 20:28 tennessee.geojson
iPython 3.4.0 (default, Apr 15 2014, 20:38:59) [GCC 4.2.1 Compatible Apple LLVM 5.1 (clang-503.0.38)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> import json >>> f = open('tennessee.geojson') >>> j = json.loads(f.read())Since GEOJSON is a subset of JSON, we can just use our favorite programming language to trim this data down to the parts we want - namely, Davidson County.
>>> j.keys() dict_keys(['features', 'type', 'crs']) >>> j['type'] >>> 'FeatureCollection' >>> len(j['features']) 4125In this generated file, we have a feature collection. There are over 4000 features (in this case, block groups).
# note that the properties value stores # useful information like county FIPS (COUNTYFP) >>> j['features'][0]['properties'] {'NAMELSAD': 'Block Group 3', 'BLKGRPCE': '3', 'STATEFP': '47', 'INTPTLAT': '+36.0563099', 'INTPTLON': '-083.8449238', 'COUNTYFP': '093', 'AWATER': 0.0, 'TRACTCE': '005201', 'GEOID': '470930052013', 'MTFCC': 'G5030', 'ALAND': 14162071.0, 'FUNCSTAT': 'S'}
>>> # Davidson County is 037. # number of davidson co block groups >>> len([feature for feature in j['features'] \ if feature['properties']['COUNTYFP'] == '037']) 473 >>> davidson_county_blocks = [feature for feature in j['features'] \ if feature['properties']['COUNTYFP'] == '037'] >>> j['features'] = davidson_county_blocks >>> dc = open('davidson_county_blocks.geojson', 'w') >>> dc.write(json.dumps(j)) 2259120 >>> dc.close()
-rw-r--r-- 1 brian staff 2.2M Sep 4 20:59 davidson_county_blocks.geojsonSo it's still 2.2 MB. Now we could do things like smooth out the lines, in each multipolygon, but that is beyond the scope of this talk. It makes your maps more jagged.
https://github.com/wrobstory/folium
"Python Data. Leaflet.js Maps."
If you're a Python guy, instead of a JS guy, this is an easy way to generate a map.import folium geo_path = r'data/davidson_county_blocks.geojson' bna = folium.Map(location=[36.1881350, -86.7207050], tiles='Mapbox Bright', zoom_start=10) bna.geo_json(geo_path=geo_path) bna.create_map(path='bna.html')The location is the long/lat for Nashville (near where I live).
Need to find out number of permits per blockgroup.
Thankfully, the data has longitude/latitude appended to the address, so we don't have to get into geocoding (that's a whole 'nother talk).for each blockgroup count number of points in block groupUltimately, what we want to do is loop over the block groups and count the number of permits that lie within that block group.
Using Postgress.App on Mac OS X
$ psql -h localhost psql (9.3.5) Type "help" for help. brian=# create database buildings; CREATE DATABASE brian=# \c buildings You are now connected to database "buildings" as user "brian". buildings=# create extension postgis; CREATE EXTENSION buildings=# \qNow that wasn't too hard, was it? PostGIS is not enabled by default. It used to be a lot harder to turn PostGIS on, but it's quite easy these days.
$ shp2pgsql -s 4326 tl_2013_47_bg blockgroups | psql -d buildings $ psql -d buildings buildings=# select count(*) from blockgroups; count ------- 4125 (1 row)This command line tool is included with PostGIS. It takes the shapefile and loads all of them into the blockgroups table, which is geospatially enabled. The -s transformed it to a very commonly used projection, you'll be seeing more of that later.
That wasn't too bad.
Loading up Shapefiles into PostGIS is pretty easy.Unfortunately, the government data is a bit messier.
CREATE TABLE building_permits ( permit INTEGER, permit_type_description VARCHAR(40), parcel VARCHAR(18), date_entered DATE, date_issued DATE, const_cost VARCHAR(12), subdivisionlot VARCHAR(100), contact VARCHAR(60), per_ty VARCHAR(4), ivr_trk INTEGER, purpose TEXT, address VARCHAR(31), city VARCHAR(14), state VARCHAR(2), zip INTEGER, mapped_location VARCHAR(92), longlat geometry(Point, 4326) );Had to eyeball the file to create a decent SQL table container.
buildings=# \copy building_permits (permit, permit_type_description, [...] zip, mapped_location) from 'building_permits_20140828.csv' with csv header; buildings=# select count(*) from building_permits; count ------- 14678 (1 row)
Note that longlat is still NULL.
The longlat was embedded in the address. Bureaucrats, right? What're ya gonna do.Those that have sensitive stomachs may want to look away.
update building_permits set pt = st_pointfromtext('point(' || replace(substring(mapped_location from '[\-0-9\.]+\)'), ')', '') || ' ' || substring(mapped_location from '[0-9]{2}\.[0-9]{10}') || ')', 4326);
Basically, longlat = POINT(X, Y, SRID)
Yes, I threw regex at it. Now I have... more problems.Again, the 4326 is the same SRID you saw for the ogr2ogr tool earlier.
buildings=# select geoid, count(1) as total_permits from blockgroups bg, building_permits bp where st_contains(bg.geom, bp.longlat) group by geoid order by total_permits desc limit 5; geoid | total_permits --------------+--------------- 470370156232 | 272 470370195002 | 267 470370191141 | 266 470370121001 | 260 470370195001 | 252 (5 rows)This tells us that some blockgroups have a lot of permits, but not much else.
#buildings=# \copy (select...) to 'data/total_permits_by_geo.csv' with csv;
import folium import pandas as pd geo_path = r'davidson_county_blocks.geojson' building_permits = pd.read_csv(r'total_permits_by_geo.csv') bna = folium.Map(location=[36.1881350, -86.7207050], tiles="Mapbox Bright", zoom_start=10, width=640) bna.geo_json(geo_path=geo_path, data=building_permits, columns=['GEOID', 'Permits'], key_on='feature.properties.GEOID', threshold_scale=[25, 50, 100, 150, 200], fill_color='YlGn', fill_opacity=0.7, line_opacity=0.2, legend_name='Building Permits') bna.create_map(path='bna.html')
Time to step away from the folium and go raw Leaflet.js...
Let's say that we wanted to see additional information when hovering over each blockgroup. Leaflet has a great example on their website.<!DOCTYPE html> <head> <link rel="stylesheet" href="http://cdn.leafletjs.com/leaflet-0.5/leaflet.css" /> <script src="http://cdn.leafletjs.com/leaflet-0.5/leaflet.js"></script> </head> <body> <div id="map" style="width: 960px; height: 500px"></div> <script> var map = L.map('map').setView([37.8, -96], 4); L.tileLayer('http://{s}.tile.stamen.com/toner/{z}/{x}/{y}.jpg', { maxZoom: 18 }).addTo(map);</script> </body> </html>
We need to pull in our GeoJSON, can use queue and d3...
queue() .defer(d3.json, 'data/building_permits.geojson') .await(makeMap); function makeMap(error, geoJSON) { }
var map = L.map('map').setView([36.188135, -86.720705], 10); davidsonCo = L.geoJson(geoJSON).addTo(map)
davidsonCo = L.geoJson(geoJSON, {style: style, onEachFeature: onEachFeature }).addTo(map)just by adding some additional configuration options we can create methods that add stylistic touches and popups.
function style(feature) { return { fillColor: getColor(feature.properties.building_permits), opacity: 1, color: 'black', fillOpacity: 0.9, weight: 0 }; } function getColor(d) { return d > 250 ? '#800026' : d > 200 ? '#BD0026' : d > 150 ? '#E31A1C' : d > 100 ? '#FC4E2A' : d > 50 ? '#FD8D3C' : d > 20 ? '#FEB24C' : d > 10 ? '#FED976' : '#FFEDA0'; } }Note that I'm keying off the building prop data in each feature properties. Simply added this by munging about the JSON in Python.
function onEachFeature(feature, layer) { if (feature.properties && feature.properties.GEOID && feature.properties.building_permits) { layer.bindPopup(feature.properties.GEOID + " has " + feature.properties.building_permits + " building permits."); } }
d3.geo
d3 makes this really easy. It simply draws geographic features as SVG or Canvas. Note that I don't have a whole lot of expertise here, so if you already know how to do this feel free to point your finger at me and laugh. Let's walk through creating this map with d3.var width = 500; var height = 500; var vis = d3.select("#vis").append("svg") .attr("width", width).attr("height", height)We created a div with the id of "vis", and created an svg container inside it matching our specified height.
d3.json("data/building_permits.geojson", function(json) { ... });Rather than embedding a geojson document inside our HTML, we'll load it with d3.
var center = d3.geo.centroid(json) var scale = 50000; var offset = [width/2, height/2]; var projection = d3.geo.mercator().scale(scale).center(center) .translate(offset);
Helpful tips: www.maori.geek.nz/post/d3_js_geo_fun
We'll center the map on our geojson collection (saves much headache!). You can get more clever about setting scale, but that's beyond the scope here. The projection is set using mercator (which we won't get into, but default is albersUSA), centered on the multipolygon centroid (what's that) translate offset "determines the pixel coordinates of the projection’s center."var path = d3.geo.path().projection(projection);Creates a new geographic path generator.
vis.selectAll("path").data(json.features).enter().append("path") .attr("d", path) .style("fill", "#a6bddb") .style("stroke-width", "0.3") .style("stroke", "white")The magic happens here - SVG path is created from the json.features (inside the feature collection) with the specified style.
var color = d3.scale.quantize() .range([ "rgb(254,229,217)", "rgb(252,174,145)", "rgb(251,106,74)", "rgb(222,45,38)", "rgb(165,15,21)" ]).domain([0, 300]);We can use the d3 quantize scale to create a nice color scale to use. I manually set the min and max for convenience. If you haven't noticed yet, d3 is pretty friggin' awesome.
function choroplethColor(feature) { return color(feature.properties.building_permits); }Next we create a simple func that accepts a feature and returns a color based on the number of building permits.
vis.selectAll("path").data(json.features).enter().append("path") .attr("d", path) .style("fill", choroplethColor) .style("stroke-width", "0.3") .style("stroke", "white")Last, we simply pass the function into the style.
.on("mouseover", function(feature) { d3.select(this).transition().duration(300).style("opacity", 1); div.transition().duration(300) .style("opacity", 1) div.text(feature.properties.building_permits + " building permits") .style("left", (d3.event.pageX) + "px") .style("top", (d3.event.pageY -30) + "px"); }) .on("mouseout", function() { d3.select(this) .transition().duration(300) .style("opacity", 0.8); div.transition().duration(300) .style("opacity", 0); })Simply setting opacity to 1 after mouse over, and positioning the hover div over the event.
div.tooltip { border: 1px solid #ccc; font-family: sans-serif; font-size: 12px; color: #333; width: auto; padding: 10px; position: absolute; pointer-events: none; background-color: #fff; }
$ ls -lc -rw-r--r-- 1 brian staff 2.2M Sep 20 20:44 building_permits.geojson
Three mapshaper.org to get you started.
This online tool allows drag & drop simplification of Geospatial data.ogr2ogr -f GeoJSON simplified.geojson -simplify 0.01 building_permits.geojson
ogr2ogr -f GeoJSON simplified.geojson -simplify 0.001 building_permits.geojson
$ ls -lc -rw-r--r-- 1 brian staff 303K Sep 20 20:43 building_permits.geojson
@byeliad / dailytechnology.net
Slides: dailytechnology.net/2014-coderfaire-maps
Source: github.com/briandailey/2014-coderfaire-maps
Special thanks to Lance Roggendorff for reviewing.