Making Maps @Mapbox
Sean Gillies
Fort Collins Python Web Development Meetup∙July 14, 2014
How Mapbox works
- Making maps
- Map data
- Using maps
Making maps: overlays
Making maps: baselayers
Making maps: tilemill
Mapbox map data
- Streets: OpenStreetMap
- Imagery: DigitalGlobe and other providers
- Common to all customers
Added map data
Local map data
- Not uploaded to Mapbox
- Combined with maps in realtime
- Using API or SDK
Using maps on the web
JS, you say?
Mapbox.js is a great plugin for Leaflet.
Using maps on mobile
Distributed is how we work
- 30 in Washington, DC
- 15 in San Francisco
- 12 remote: US, Canada, Europe, South America
Distributed is really how we work
- GitHub for everything
- for code
- for supplies
- for travel planning
- for business strategy
Python @Mapbox
Mapbox Cloudless Atlas
From cloudy scenes
Declouding
- Scene selection (season and cloudiness)
- Atmospheric correction and reflectance normalization
- Score every pixel of every source image
- No cutlines
- Pixel by pixel
High scoring pixels
Low scoring pixels
Earth is a cloudy planet
- 50-70% mean cloud cover over land
- Landsat 8 doesn't yet have cloud-free imagery everywhere
- So we also use Landsat 7
Scan Line Correction
Landsat SLC-Off Imagery
- Since May 2003
- Without it we don't have enough cloudless pixels
- It complicates things
- A lot
Bands
Debanding
- Transform the image using a Fast Fourier Transform
- Locate the artifact peaks
- Remove the artifacts
- Transform the image back using an inverse transform
Solving the problems
- We have no off-the-shelf solutions
- Trial and error is involved
Cloudless Atlas Software Requirements
- Rapid prototyping and iteration
- Proven algorithms and drivers
- Ability to deploy to 100s of servers
Software stack
- 1 and 3 argue for open source
- 1 argues for a high-level language with handy multi-dimensional array syntax
- 2 argues for LAPACK (&c) and GDAL
- The fit: Linux, Python, Numpy, Scipy
GDAL Python bindings?
- Very C++ style interface
- Failure to keep SWIG and C issues in mind results in crashing programs
- We can do better
Scipy-style raster data library requirements
- Read/write ndarrays from/to data files
- Python types, protocols and idioms instead of C/C++ ones
- Free programmers from having to think about C/C++ (memory management, pointers, &c)
Rasterio's Design
- A Python package at the top
- Extension modules (using Cython) in the middle
- Fast loops, typed memoryviews, "nogil" blocks
- GDAL shared library on the bottom
Reading data
import rasterio
with rasterio.open(path) as src:
bands = [src.read_band(i) for i in src.indexes]
open()
gives you a file-like dataset object
read_band()
gives you a Numpy ndarray
- Read windows of data with extended slice-like syntax
Writing data
kwargs = src.meta
with rasterio.open(path, 'w', **kwargs) as dst:
for i, arr in zip(dst.indexes, bands):
dst.write_band(i, arr)
- Get keyword args needed to open a dataset for writing from another dataset
write_band()
takes an ndarray
- You can also write to windows of a dataset
Georeferencing
Rasterio follows the lead of pyproj
>>> import rasterio
>>> src = rasterio.open('rasterio/tests/data/RGB.byte.tif')
>>> src.crs
{'units': 'm', 'zone': 18, 'ellps': 'WGS84', 'proj': 'utm', 'no_defs': True}
rasterio.features
rasterio.features.shapes()
yields all the features of an array as GeoJSON-like objects
rasterio.features.rasterize()
"burns" GeoJSON-like objects into an array
- Dicts, iterators, tuples, ndarrays
- No datasets or layers necessary
{'coordinates': [[(71.0, 6.0), ...]], 'type': 'Polygon'}, ...
rasterio.warp
rasterio.warp.reproject()
maps elements of one array to another, using cartographic projections
- No datasets or layers required
- Data created in non-GIS programs can be reprojected for use with GIS programs