Server-side mapping

We have several projects that involve processing large geospatial datasets (geo-data) and displaying them on maps. These projects present some interesting technical challenges involving the storage, transfer and processing of geo-data. This post outlines some of bigger challenges we have encountered and our corresponding solutions.

The challenge

In the past we have used the GMap and OpenLayers libraries and their equivalent Drupal modules on our mapping projects. They are effective solutions when you have a small or even moderately sized collection of entities containing some simple geodata (points, lines, polygons) that you want to present as vector overlays on a map. Unfortunately they tend to fall apart fast when you attempt them with larger datasets. There are two main reasons for this:

  1. Geospatial data can be large, particularly as we tend to encode it in text-based formats such as WKT or GeoJSON when we are sending it to a web browser. The larger the data, the longer it takes to transfer from server to client.

  2. The information being sent is raw data which means that the client needs to parse and process the data before rendering it on the screen. The more data there is, the longer this takes.

Making things worse, the geo-data is often sent at the beginning of the html document (via Drupal.settings or similar). Most browsers will wait until they have downloaded and parsed this data before they begin to render the rest of the page, increasing the delay.

As a result of the above, it doesn't take much to have a serious negative impact on page load times and little more to actually crash your visitor's browser.

Heavy lifting server-side

A good solution to these issues is to process and render the geo-data as image tiles on the server. Tiles can then be cached and served to the client when requested and the data is only rendered whenever it is changed instead of each time the page is loaded. Bandwidth is also reduced as the image tiles are relatively consistent in size regardless of the complexity or amount of data used to produce them.

As a demonstration we have created two maps containing some sample road data:

Example road data rendered as SVG element

Example road data rendered as img elements containing PNG tiles

I recommend testing these examples in a variety of browsers as their performance varies on the different platforms - particularly for the first example.

There are several components involved in a server-side tile rendering pipeline. They can be loosely categorised under storage, rendering, and caching.

Storage

Geo-data can be stored in a variety of places and formats each with it's own advantages. Here are some that are common:

ESRI Shapefiles

ESRI Shapefiles (commonly known as shapefiles) are a popular file format for storing and transfering geo-data. They are comprised of a .shp file and often bundled in a zip file with a collection of other files containing related information.

Well known text (WKT) & GeoJSON

WKT and GeoJSON are formats used to encode geospatial data in plain text, making them convenient to read and parse at the expense of increasing file size.

GeoJSON is a relatively new format. As it is just JSON and therefore easily parsed in Javascript, it is an increasingly popular format to use when passing raw data to browser-based clients.

PostGIS

PostGIS is a spatial database extension to the PostgreSQL database management system. The relational database gives you the ability to index, query, and manipulate your data with SQL and an extensive API of geospatial functions.

In Drupal it's common to store your data in fields attached to entities using the Geofield module; however the data is stored formatted as WKT in a column of type LONGTEXT and when compared to PostGIS it not very flexible.

We have therefore developed Sync PostGIS which allows site developers to flag entity types with geofields to have their data mirrored in a PostGIS database. The source data in Drupal's main database is retained and treated as best-copy, but all changes (insert, update and delete) are reflected in the PostGIS version. This gives us the ability to utilize PostGIS's rich geospatial features on our Drupal-managed geo-data!

Rendering

Once we have our raw geo-data stored somewhere we need a method of converting it into the images that we will display on our maps. Mapnik is an excellent tool for the job.

Mapnik

Mapnik is an open source C++ library designed to generate map images from a variety of data sources and configurable style rules. Language bindings are available for Python and Javascript (Node.js) as well as an XML-based stylesheet format.

TileMill

TileMill is a desktop application for creating web maps. It is developed by Development Seed to complement their MapBox service. Powered by Mapnik and Node.js it allows users to define style rules using a CSS-like language called CartoCSS. With each change, the rules and data sources are passed to Mapnik and a preview map is rendered giving immediate feedback.

TileMill's main output will render tiles and package them in the MBTiles format. However it can also be used to generate a Mapnik XML stylesheet which can be passed to Mapnik by other applications to render tiles.

MapBox has a great collection of resources to get you up and running with TileMill. I recommend starting with their crash course.

Caching

So far, we have resolved the bandwidth issues discussed at the beginning of this post by rendering our data into tiles on the server with Mapnik. This also alleviates the visitor's web browser from the strain of processing large amounts of raw geo-data. However generating tiles on the server is also a resource-intensive process; depending on the area and zoom levels you wish to cover, rendering a set of tiles at once can take anywhere from a few seconds to more than a week.

Obviously we don't want to be rendering tiles from scratch with every request. Instead it is much more efficient to cache the tiles somewhere after they have been rendered and serve requests directly from the cache, only resorting to rendering when a cached tile doesn't exist. There are many ways to cache tiles on your server. Here are some methods that we use:

MBTiles

MBTiles is a file-format specification pioneered by Development Seed. It is essentially a SQLite database containing a whole set of rendered map tiles. Known as tilesets, these files are portable and lightweight and can be generated by TileMill. They are great for caching base layers or layers comprised of data that doesn't change frequently. However they require tiles to be rendered in advance, making them less useful for maps covering large areas and zoom levels, or data sources that often require updating.

File system

Map tiles are individual image files, usually 256x256 pixels in dimension and rendered in a compressed image format such as .png. In most situations storing them directly on a file system is satisfactory.

Memcache

If you are expecting a lot of requests concurrently, you may want to avoid the file system and cache tiles in memory. Memcache or similar systems are made for this task.

All together

There are a plenty of options available for tile servers including TileCache, TileStache, TileLite, TileStream and Mod Tile. We have been using TileStache as it has an excellent balance of features and simplicity.

TileStache

TileStache is a server application that handles requests for tiles and serves and caches tiles generated from Mapnik or other rendering providers. It's implemented in Python and designed to be extended with a solid plugin system.

Out of the box, its features include:

  • Rendering Mapnik maps
  • Serving MBTiles tilesets
  • Caching tiles to file system, MBTiles, Memcache or Amazon S3
  • Composite 'layers' into single tilesets

The compositing feature in particular is very powerful. In TileStache's configuration you define a set of 'layers', each layer being a different tileset and effectively its own map. You can then define composite layers which are new tilesets comprising of other layers on top of one another. This allows you to do things like combining a pre-rendered tileset stored in an MBTiles file with a tileset of features stored in PostGIS and serving them to your visitors browser as one flat set of tiles.

Shifting constraints

The range of tools and techniques described provides plenty of flexibility when we are working on mapping projects. It is all achieved without wasting bandwidth or bogging down our visitor's machines with redundant computation.

Previously we had a strict upper-limit on the amount of geo-data we could manage and serve, based on the limits of the network and our visitor's hardware. As evident in this final example, our challenge now is deciding how much data can we can fit into our maps without sacrificing their readability.

Multiple layers of data rendered as png tile images displayed with img elements

Comments

Minor note on Geofield

Great article about the challenges of working with tons of geospatial data. A note on Geofield, though. You mention that Geofield stores it's data as raw text/wkt data. While that's true with the stable release, we're transitioning to storing our data as wkb, with the ability to store geo data as proper data types in PostGIS when available. You can follow along with our progress (and help out!) on the 7.x-2.x branch of Geofield.

Thanks for the update on

Thanks for the update on Geofield! I wasn't aware of the WKB/PostGIS support work, we'll be sure to keep an eye on the progress.

hi, i am currently working on

hi,

i am currently working on a module to allow for server-side clustering of geo-data in drupal 7:
http://drupal.org/project/geocluster

you use case sounds a little more complex than what i'm trying to solve, but maybe there is some overlap in our work.

my current stack for geocluster is based on geofield, views_geojson, leaflet and i plan to implement the geohash-based clustering algorithm as well as an apache solr plugin to be used with search_api.

here's a quick demo of what's working so far:
http://dev.geocluster.gotpantheon.com/maptable

regards

Cool! Clustering is a useful

Cool! Clustering is a useful technique for large, dense, point-based datasets to simplify maps and improve readability. By shifting the processing to the server you prevent redundant computation and minimize the processing/rendering load on your clients. I can also see that kind of processing leading itself well to other representations like heat-maps.

In our case we are working with a variety of different geometries in our datasets (lines, polygons and points) so point clustering hasn't been particularly necessary (yet). It would be interesting to look into incorporating a server-side clustering component into the tile rendering pipeline I describe above.

I will keep an eye on Geocluster as we will likely have need of it in the near future.
Thanks!

Post new comment

The content of this field is kept private and will not be shown publicly.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Allowed HTML tags: <a> <em> <strong> <cite> <code> <ul> <ol> <li> <dl> <dt> <dd>
  • Lines and paragraphs break automatically.

More information about formatting options