Overview of Vector Tiles – Transcription

Next up, we have Paul Norman, talking about an overview of Vector Tiles.
Paul Norman: I’m Paul Norman, I’m an OpenStreetMap developer, and I’m working on the default style sheet on OSM.org. And with various jobs, I’m also involved in Vector Tiles at a number of companies. There are still a lot of confusion about what Vector Tiles are, what they are good for and, as of yet, there is no standardization of software, like there has been in Raster Tiles. And so, this talk is going over some of the various options, which there are. So what are tiles in general? When you have a map, you typically cut it up into tiles. It is cut up in a defined way, and the – it is important that it is pre-defined for caching, that’s the real important part about tiles. Now, the – in OpenStreetMap, we commonly use the XYZ scheme for tiles, which is the same as Google’s, and you will see it in urls like this. And your zoom, X and Y, define what area of the world a particular tile is. There’s a few ways to use Rasser Tiles, they are obvious, you have images, you stick them together and display them to the user. This can be with browsers, Leaflet – you have other ways, too, if you’re on a phone or application. They do exactly the same thing, they download the images, put them together in the right place, and allow you to zoom in and out, pan around, and have generally additional features on top of that, displaying points and the reason you have a map, like points and the other features as some kind of over lay. So, how do we use Vector Tiles? That’s not really the right question. What do we do with Vector Tiles is where you want to start. You can do analysis, Vector Tiles are good for some types of analysis, they are very parallel. But for many especially analysis, you cannot do it within the context of a single vector tile. You have to look at adjacent ones, they are good for some types of analysis, but not others. You can take Vector Tiles and turn them into Rastor tiles. This has advantages when you are making maps of different styles, or scaling up for high-resolution devices. So even though you end up with Raster Tiles in the end, this different approach has some different advantages. And a third common task is you send your Vector Tiles as Vector Tiles to a client device, which browser or mobile device, and they turn them client-side into a rendering, which these days involve sending it to the graphics card. But that is something that ultimately gets you to pickles on your screen. And now, with each of these tasks, no one set of Vector Tiles is ideal for all the tasks. There are compromises involved in designing a schema. A schema is the layer names, the attributes, all of that stuff that the cartographer designing the tiles decided to include. There is no one schema for everything. So some open examples of schemas are open map tiles, maps and tiles, there are two schemas, they are both ultimately designed to render a map, but they designed different things or priorities, they have different classifications of roads appearing at different zooms, they are not interchangeable. And there are other closed options, like Mapbox streets, which is differentiated, Thunder Forest, and they are designed for different purposes. And there are other schemas for tiles where you are planning to do analysis, and these are all designed for the same broad goal of displaying a map. So the overall strength of the Vector Tiles are, they are quick for rendering multiple similar maps. That is why Mapbox developed their Vector Tiles stuff. As a consequence for that, it is quick for rendering maps for print, which is a problem with some traditional map rendering options. And they are good for client-side rendering, which is why they were developed at Mapquest, one of the first companies which used Vector Tiles. There are weaknesses. The big one is they are more complicated, much more complicated. This talk would be 1/4 of the length if it was for Raster Tiles, because there are obvious choices for everything. Because you have the schemas, you have a higher risk for vendor lock-in. If you developed your style to work with a particular set of Vector Tiles, you are tied in with that, particularly if it is not an open set that you could reproduce yourself. So, you are locked – you are locked into that particular vendor, unless you want to do a bunch of work to port your style to something else. There are no mature standardized tools, this – I see this in the questions that I’m asked regularly, that’s why I’m giving this talk. And another weakness, related to this, is there are very few people doing this independently. There are lots of people, there are local OSM local chapters, there are a lot of people who are rentering Raster Tiles for their maps. And you can follow what they’ve done, and do it yourself. And the last thing, a strength, is client-side rendering. This can be a problem in some situations with slower mobile devices, rendering client side takes more CPU and graphics power on the client. So that is what Vector Tiles are in general. But, when we get to specifics, there’s a few different formats. You can have geo-JSON, this is – geo-JSON is a standardized form for geodata. It has good support and lots of tools. It is not the most space-efficient, so people have use to move to other options. Topo-JSON is another standardized format, it is more efficient, the client support is not nearly as good. It is – it does some tricks, too, to reduce duplication with multiple geometries. And the third big format is the Mapbox vector tile format, in VT, it is a binary format, it makes it harder to debug, because you cannot open a file and look at it. But it is designed for, say, native apps and stuff like that, stuff – you can do it with JavaScript, too. But it is designed for that, so it is quite efficient at that. And there are some other formats, which are pretty specialized, O5M, open science map, which is an older binary format, not just so much, or services JSON, which you only use if you have an S3 shop. All of the different ways to serve Vector Tiles kind of rely on basically the same architecture. You have a database, which is going to be a postgress database, loaded with OSM to SQL, and then you have, circled in red, some kind of server. And then it produces Vector Tiles, which you do stuff with, which is turning them to Raster, or doing nothing at all, and then you send it out to content distribution network, and then to the client. So, in general, these all work the same that there’s a generator, there is something that queries the database, comes up with a bunch of features for different layers, and then writes them into a tile, that is put into a tile store. But that is something like S3, Cassandra, something that stores objects. It can also be files on disks, that works fine. And then you have a server that receives the request, if the tiles are in the store, it fetches them from there, or from the generator, which generates it. And, depending on the exact architecture, this could – this can work a few different ways, exactly which components it talks to. And then your other stuff is generally either doing nothing, or turning to a Raster Tile. So what are some options for generating these Vector Tiles? I did a blog post on this outlining the options, the blog post has more detail on some of these. But there’s half a dozen major options which you can consider, and some others that are more specialized. The first of these, which is first because it is a bit different, is Tile Maker, which is developed by Richard Fearhurst, who developed Pot Latch, Pot Latch two, and started iD. It takes a unique approach in that it reads in the entire OSM planet file at once, and produced an entire set of Vector Tiles at once. This makes it very easy in some ways, because you can then take the set of Vector Tiles and serve them out as files with lots of reliable, easy ways, which means you don’t get calls at 3:00 in the morning, with a pager alert, as operations. And so this is because serving and generation are detached. But it has inherent problems with it, it is inherently unable to do incremental updates because it has to process the planet at once, and because of that, it does not do well with big data sets. You would do well with a U.S., or even California. If you put the entire planet through it, you will not have enough RAM. The most traditional option is Map Making, it is a node made for generating Raster Maps. And what Mapbox basically is, Vector Tiles for them started out taking out Mapnic represents thing internally and writing it out to a file. I would say the most common option is the one that most people know. It is used, it is a library. It is not software itself, it is used in Codeatherian, used in Wikimedia, Tesarae, and Seth is around today. And it has options for developing styles with it, like cosmetic. And that all comes from using Mapnic. When you are writing the rules to generate the tiles, you are writing them into the tile 2 source format. So I will speed up a bit, that is SQL. So when I was evaluating this, I was looking for the multi-style contributor SQL, and YAML, you can have multiple people working, combining the work, and you won’t have horrible conflicts and stuff like that. But the development of Mapnic is basically dead. There used to be a community of developers around it that went away a couple of years ago, and Mapbox has moved away from using Mapnic in a key role. One plus is that most of the servers have full support for incremental updates. Tile Stash is another option. It is a well-established tile service, it has been around for a number of years, but the vector support is very new. There’s one big problem, and you have to write everything in JSON. Writing SQL in JSON is not an experience you want, because you have to escape new lines. It is – with OpenStreetMap, we went to a great effort to get away from this. You do not want to start doing this. Tegula is a new project, it is written in Go, and the development is somewhat active. Drawbacks, it doesn’t have a tile store implemented, you have to do that yourself in front of it, which is actually really tricky. It has a few features that are missing. I believe that they have been added recently, but they have not made it into a release. The reason are SQL in TAML, which are similar to YAML, you can have line grace, comments, and all the things you can’t have in JSON. And T Ranks is written in rust, it is actively developed, and it is mostly feature complete, it has a tile store. It has a nice demo front end for development, so you can see what you are doing. That is very useful, because a vector tile is not like a raster tile with visualization, it is a bunch of data instead. And rules are SQL in TAML, the same advantages, it works. You have maps In Tilemaker, one of the other big options. This is used by Mapzen, it is used for their tiles, which is the Vector Tile schema. It is currently SQL in a template system, and it is changing, though, too, because of some particularitys of what they are doing to avoid going to postgress as often to introduce another abstraction, which you don’t need to use. E&Ds from Mapzen is on the team that works on Tile’s End, tile maker, and he is at the conference today as well. The big drawback to it is that it is a much more complicated architecture. It has a lot more moving components, and harder to get started with. But it does everything that you need, and the only option on here, other than node map, I’m aware of people using it for tiles in a base map in production of any kind of scale. The other ones, people are just not using for those big projects yet. I will go through the rendering options for time reason. We have open layers, leaflet, and Mapbox JL. Mapnet is Raster only, it has the best options and the languages that people are most familiar with. It has been around the longest. And Tam Gram is written by map’s end, it has JavaScript and C++ versions, and the styles, you write them in Tam Gram sixty-two scene files. It is not – it is unique to Tam Gram, that format. And they are not hard to write. I didn’t find them hard to write. You are have open layer and leaflet, these are in browser, not a good option if you want to have them client-side on a phone. And for performance, if you have something complicated, it is not known because people are not doing that in them. It is good if people are doing an overlay, or something like that, with a few features, simple styles. However, one of these, it is open layers, now has supports for reading the same styles as Mapbox GL, that is written by Mapbox, and it has JavaScript and C++ versions – both of these are rated for your phone or mobile device. Styles are in Mapbox GL and JSON, which is a reasonably well-documented format, but it suffers from the format of writing any JSON, you can’t write comments, it tends to be verbose with a lot of escaping. But the style language is common to open layers. So it has scenes, some standardization. And I have written it as well. I don’t like writing it, I’m perfectly fine writing it, but it is not good for experimenting with this stuff. I have information in the blog post I mentioned earlier on. For this, I implemented the same set of rendering rules into multiple engines, and not all of them because time reasons, and some of them were complex. But I have gone through the process of setting these up and finding things that I wanted to do and couldn’t. But, we are out of time. So, one or two questions, or I will also be around at lunch. At the back? So Mapbox itself, for generating Vector Tiles, they have the proprietary option that is tied in with Amazon. Basically, they read stuff into the updates into S3, they have a dynamo DB database, and they use lambda tasks every time there’s an interaction between dynamo and DB to update Vector Tiles, and more AWS magic, besides the fact it is proprietary and closed-source. It is a complicated architecture, if you are not tied into Amazon, it is not a good option for starting with. Yeah? Audience question: (Speaker far from mic) right now, you can have rendering. I don’t see AWS lambda being a popular option for this stuff. Architecturally, everything that I have seen involving that has been complex, a bit of a Goldberg machine, in some ways, costlier and less efficient. It is a complicated architecture that way.

If there are no more questions, we are going to break for lunch and meet back at 2:00. Thank you speakers for a great morning, and thank you, Paul. (Applause). Live captioning by Lindsay @stoker_lindsay at White Coat Captioning @whitecoatcapx