Stable Linear-Referencing System for OSM – Transcription

Hello, I’m David Nesbitt, I’m lead at the Mapzen mobility team. I was one of the very few attendees at the first State of the Map US in Atlanta, I presented in 2010 to a classroom of about 20 people. It is good to see how the community has grown. Today, I’m going to talk about stabile linear-referencing system for OpenStreetMap. So, before talking about linear referencing, I will refer to our project of Voholla, a resource navigation software built on OpenStreetMap. The reason we talk about this today, we use the libraries and tooling in Valhalla to support the linear referencing system, and we support the linear referencing system in Valhalla for a couple applications. So first, we do dynamic run-time costing. And, rather than computing the cost and embedding them into the data, what we do is, at runtime, we have algorithms that use attributes of the data to generate the cost. This allows us to be much more flexible and build optional costing profiles for use at runtime. I will talk later about routing graph tiles, where we segment the data. Do I have this too high? Closer? It wasn’t high enough. Sorry. So we divide our routing network into tiles, similar to maps, where you have geographic regions, this allows better regional data access, better caching, mobile use, and allows us to target embedded platforms. And the next is matching, we can take GPS locations and match them to the road network in Valhalla, and match them to the linear referencing that we do. I went too far. So the problem we’re trying to solve is how can we associate external data to the OpenStreetMap road network, and make it usable for routing applications and other creative and new applications. Some of thisidate we want to associate, most of the data is not readily available to OpenStreetMap mappers, and some of it might vary over time, things like traffic and parking, not something that you can put in OpenStreetMap directly. Other data is proprietary, a company is gathering travel counts and things like that and build applications around it and want to associate to a routing engine, for instance. So to solve this problem, we are creating a set of stable identifiers to reference a point along a roadway. We want these identifiers to be stable so you can make reference to OSM data, and the iDs refer to the same section of roadway. We want the system to be extensible so, as OSM adds new roads, new IDs and linear referencing segments can be added. So one question is, why doesn’t someone associate the data to OpenStreetMap way iDs? There’s a potential issues with this, OpenStreetMap way iDs, they are not persistent, they can change as they go away, new ones can be added, things like that. And secondly, they provide inconsistent localization. Many OpenStreetMap ways are very short. The example here shows an overpass, which isn’t a very significant feature, but it has its own way iD, and that is a little wasteful when you are trying to talk about creating a compact representation of paths. And another problem is when there are large stretches of roadway, going through many intersections, it doesn’t really localize data to one small portion of that way iD. So the solution that we’ve come up with, we call OSM LR, which is OpenStreetMap derived location referencing. We put this as part of the open traffic project, that I will talk about in a couple slides, we are documentation on GitHub. It is all open source and readily available. We create these linear referencing in two different formats, one is protocol buffer, which is useful for integration with applications, because it is very compact and quickly parsed. But we also output geo JSON, which is very interchangeable with other tools. OSM LR is an interchangeable standard with Tom Tom, and we use that for location referencing. And the basic concept for location referencing, you define a set of location reference points, these are most generally intersection points, points that are going to be fairly stable in your map or your road network, and then you provide hints about what the path in between them will look like, how long it is, what the heading or bearing from the intersection is, things like the road classification, whether it is a link, a ramp, or a general roadway. So once you have these location reference points, an application can go back and match mem to the road network by finding candidates at each of the location reference points, candidate nodes and edges in the routing graph, finding the possible paths between those two, and selecting the path that most closely matches the length bearing and the other parameters. And we have several documentation and blog posts available at for more details about the OSM LR. So, what does this location referencing buy you, and how might you use it? So, to make the iDs stable and persistent, you really need some entity that is in charge of maintaining the OSM LR definitions. So right now, at Mapzen, have created a set of initial iDs, uploaded them that we will talk about later, and they are available to the community. So a sample use of OSM LR is on the right side of the slide. There are two phases to this. The first phase is that you have to take these OSM LR location references and associate them to your road network, your routing graph. We do this with Valhalla internally, we can take OpenStreetMap planet, take a portion of the planet, and ingest it into Valhalla to create the routing graph. And then we have a secondary step, we can bring in the OSM LR segment definitions and associate them to the Valhalla edges. So now we have a one to one, well, not always a one to one, but a correlation between the Valhalla edges and the OSM LR identifiers, or the linear referencing segments that belong to those edges. That creates a routeable data set with associations to OSM LR. And now an application that wants to associate data to this and use it, I will give an example of perhaps a motorcycle club has a lot of users that are submitting their favorite motorcycle rides, and they want to share them with the community and have routes that are influenced by which roads are most popular and best for riding motorcycles on. So they take their GPS data, run it through map matching, which will associate, or match the GPS data to the routing graph, which now has OSM LR segments associated to it, and so then they can now take and say, this particular OSM LR segment has this particular trip on it. And they can do things like, have the number of trips that take a particular OSM LR segment. And then, with Valhalla, you can come in and do a custom routing profile that then brings in, as it is doing the shortest path calculations, it can take in the OSM LR-based data, it knows that, ass it traversus the edges, it knows the OSM LR segment ID can look up the popularity, or the travel counts from the associated data, and use that to favor the roads that have, you know, more motorcycle use. And so the first implementation of OSM LR, and the development of OSM LR, was done with a project called Open Traffic, which was sponsored by the World Bank. It was an open-source project, or is an open-source project. You can go to GitHub and look at all the documentation and all the source code. But basically, they are able to ingest a lot of archive data from companies, one in particular Grab in southeast Asia, ingest the data, and match it to OSM LR segment iDs, and then have statistics on travel times and travel speeds along the segment iDs. This was done by Valhalla map-matching, and stored in various datasets, which the World Bank and government agencies can use. And on the right, you will see an example of a short route, and it points out the actual OSM LR segment iDs, the speed that is maintained in a separate dataset, and it is colored based upon the speed. So I wanted to mention the details behind OSM LR, I will not get into tons of detail, in the interest of time. And I am hopefully providing plenty of resources and links that you can go and look and find more information. I mentioned tiling, we maintained persistent iDs in OSM LR by using an iD that is – it has a geographic component, the tile iD, that follows the Valhalla tile specification, which is defined in this slide. And it also has a hierarchical component. So we segment what we call highways, which are OSM motorway, trunk, and primary tagged ways, we store them in a four degree tile, so it is a larger area, but there tend to be less of those, they are less dense. The next level is arterial, which are secondary and tertiary, they are one-degree tiles, which is a slightly smaller region, but they tend to be a little more dense. And, at the final level, are what we call the local level which, for traffic, we are just including the unclassified and residential drivable roads. At this time, we are not including cycle ways, paths, and others. We use Valhalla to generate the OSM LR linear referencing segments. It is a somewhat complex process, the link here to the technical preview describes it in much more detail. But basically, as it encounters nodes and edges in the Valhalla graph, it assigns segment iDs, most of the time, or a lot of the time, a single edge in Valhalla matches to a single OSM LR segment. But there are cases where multiple edges are merged into a single OSM LR segment. And this is where edge crosses an overpass, or maybe a driveway, or kind of a minor road that we don’t really care about. And then, kind of the least common, but most complex case is edges can be split into multiple segments. But we keep edges – I mean, we keep segments less than 1KM at this time. We skip what we call internal edges, these are edges that basically are internal to intersections, and just provide transitions between different roads at that intersection, and things like short-turn channels, or at grade transitions between two roads. So we have created an initial set of OSM LR segments for the world, here are some statisticings at the highway level. There are about 11 million segments, averaging about a half, you know, just over a half a kilometer in length. Level one, the arterial level, there’s 26 million, there are also reasonable lengths.
And level two, at the local level, you can see that there’s a whole lot more, and this is, I think, somewhat problematic. And also, the average length is very small there. They tend to be a lot of small segments. And we have uploaded this dataset on to Amazon web services public dataset program. We released a blog today, and Amazon is also tweeting and announcing this as well. And briefly, we are working on an update process, so at regular interviews, we can bring in new OSM data and regenerate the OSM LR segment definitions. The goal is that we want them to be stable, and not change. We don’t want to, you know, delete ones just because of small edits. There might be times when there are major changes to a road, or a new intersection added that is of consequence that will require us to deprecate, or remove, an iD and replace it with new ones. But the goal is to be stable under, you know, local edits, moving nodes, and slight adjustments to intersections, changing classifications, tertiary, secondary, things like that, tagging. We hope to maintain most of the iDs. And we are just starting to evaluate how effective this is, so we haven’t quite tuned it for all the tolerances that say this intersection moved out of a distance, we don’t really know if it is the same road anymore. But preliminary, what we did is we had an early version from four months ago, and we just tested it against the latest OSM, and after four months, you know, 96, 97, 98 percent still match. And we hope to do some tuning to improve those numbers, but the basic idea is most of these segments and most of the data will be persistent. There will be a small number that will churn and need new iDs. So some of the future work, I mentioned we want to experiment with the update process and the tolerances. We think it is important to have methods of tracking the lineage, if one is deprecated and is replaced by two new ones, we want to track that so anyone who has associated data to the prior one can decide whether they want to transition that data to the new segments or not. We want to extend to cycle ways, paths, other OSM ways, but in a way that does not interfere with the iD space that we created. So they will have a different segment of the iD space. I think that is about it. This is the team, the Valhalla team, that did most of the OSM LR work. That was a criteria for who gave the talk, whoever could spell it. And most of the team is sitting in the front row, we will be out at our booth, and a big thanks to Matt Amos, who was one of the regional OSM pioneers, who has been a long-time colleague, and did a lot of the original work with us. This is all open source, so if you have questions, comments, contributions, you can ask us on GitHub, you can email me, or get in touch with our team. Thanks. (Applause).
We are taking questions now. I have a question about the sub-segmenting of an edge, and I remember from the original technical preview, you used the Bay Bridge in San Francisco as an example. You had a 1KM threshold, you started from the beginning of the line string and moved it all the way to the end. I was curious, you mentioned the inclusion of intersections in consideration when creating OSM LRs now. For example, with the Bay Bridge example, traffic changes along the bend of the bridge, and it also changes in reaction, for example, on to the island. Would you consider the intersections or off ramps in developing OSM LRs now? Yeah, they would probably split, they are major intersections or exits off of there. So they would be part of that. We definitely have, you know, all of the links and ramps as part of the breakage of where the segment started and ended. Did that get your question? Yes, and I was curious, is there a logic to starting on one end, versus another, or how do you – how does that it threshold work? Is it a hard threshold, or – David: The threshold is fairly hard, but it doesn’t necessarily split into one kilometer. It will break it into equal parts. If it is 1.5 kilometers, it will break it at .75, so it doesn’t matter which end you start from.
Audience question: Hi, I’m Ahmed, and I would like to ask you about the data, you are dealing with the sole dynamic, how do you address the continuous change in data happening? How do you make sure that OSM LR is actually making sure that the new generator that is happening could we adopted? David: So you are referring to OSM being – Audience member: Yes, primarily. David: That is up in the air, we want to balance the pain of updating the segment space, and having whoever has associated data of that segment space, ingesting new changes, versus, like you say, being more reactive. Right now, we are probably thinking a monthly, or maybe even quarterly update. So, yeah, it will miss a lot of initial edits, but it should pick them up on the next time around. So it is, yeah, we have to still balance that, and you know, it is up to customers and clients and users to kind of guide that and just, you know, to figure out what that frequency should be.
Host: Thank you, David. (Applause). Live captioning by Lindsay @stoker_lindsay at White Coat Captioning @whitecoatcapx.