Peer-to-peer mapping for disconnected environments – Transcription
I would like to bring up our last speakers for this session: Gregor MacLennan and Steven Whitmore, they are talking about peer to peer mapping for disconnected environments.
Good afternoon, my name is Gregor MacLennan, this is Steven Whitmore. We work for a non profit, called Digital Democracy that worked with marginalized communities around the world to empower them to use technology to defend their rights.
We are working with communities to help them create maps of their neighborhood.
In this case, it looks like this, perhaps it looks different to the neighborhoods you would map around the U.S. There is no internet service here, no cell phone service, and the relevant features in their neighborhood are not roads and buildings, they are hunting paths, streams, and the histories of their people in that region. These are also things that don’t appear in satellite images, or don’t appear in areal images.
They are hidden under the forest canopy.
So why are these communities making maps, and how are they used?
Well, first of all, they are living in areas where there are no maps of the land, or their existence. This is 700-0000 acres of territory in southern Guyana, and in Google maps, you have two small villages and nothing else appearing. Communities want to be able to show that they exist, and they want to show their presence and occupation of the land, and they want to use that information to advocate for their land rights and recognition for their presence there.
This is also a problem, because governments are granting natural resource concessions to oil companies and mining companies to their land without their knowledge, and ignoring their presence and rights to that area.
And communities are then using their maps support to support their advocacy efforts. They are using them to support land rights claims, and the maps are giving the communities a voice in decision-making sessions so they can assert their right and give their point of view.
In terms of making the maps, these are communities that are survivors of a colonial history, where Europeans have turned up, made maps of the area that have excluded them, and marginalized them and ignored their rights. They don’t want an organization to turn up and make a map for them. They want to be able to make a map themselves, and they want to control the process and the information for that map and how that information is used.
GPS is a great tool for this, it is easy to learn so they can participate in the gathering of data to make a map. After you have gathered the information on the ground, then what? Traditional GIS is complicated, it is hard to learn, and that limits who can participate in the process of actually entering the data and creating a map, and collaboration is really hard with traditional GIS, especially if you have no internet connection.
We are working in Ecuador with people of 54 different villages, that cannot be done with one person at a time. We need for people to collaborate.
So, that’s when we started getting interested in OpenStreetMap and learning about the problems that are already being solved by the community here. OpenStreetMap has already done the work to lower the barrier to get people involved in editing and creating map data. It is designed from the ground up for collaboration, and solving the big problem of how do people work together to map the entire world.
It has a data structure that is flexible enough that it can map hunting paths and sacred sites, as well as cafes and traffic junctions.
The challenge with OSM is that everything is public by default, but these communities want to make the decisions in what goes into an open map. And there are things that don’t belong in OpenStreetMap, like historical and cultural data, that is crucially important to how these communities are trying to use maps.
And also, they are working offline. They don’t just need to edit offline, but collaborate offline.
We are working on a peer to-peer database and a server that replicates the OSM API, and the desktop app that builds on iterator, and integrates it with a peer to-peer database that can facilitate offline collaborative mapping.
I will turn it to Steven who talks about the technology behind this and the magic that makes it happen.
Steven: Hi, everyone.
There is a human psychology that the less you know about something, the more you tend to think you know about it.
And just being here for the last few hours has shown how abundantly little I know about mapping and the way it is used. I’m grateful to be here to hear everything is shared.
I think we covered the main reasons why we use OpenStreetMap, all of these properties here, things like Google maps, and other solutions, they are just not options.
And so, yeah, that is how we chose them.
So first, I want to talk about how traditional map-making works. So you have this, sort of, diagram where all the clients point to a single server, and that server is the source of truth for all the map data in the world.
And then these thin clients come on, like iD and Josm and they query small subsets of all map data in time and sort of submit change sets, request change sets, and all coordination is reliant on this central point.
But, if you are offline, what do you do?
So some editors, like Jasm, can deal with brief periods of offlineness. You can cue up your change sets and submit them when you are online again. What if they are never online, what if they will spend their entire time off of the internet, what then?
Well, what if each device was also, like, OSM.org, they had a server and a database built into it, along with the client and the editor, all bundled in together? And what if these programs could talk to each other, instead of talking to a central server?
So you end up with something like this, you have peers instead of servers and clients, where each peer is an editor, a server, and a database all rolled into one.
It is funny how technology tends to mirror the society that creates it, and the society’s values. I think a peer to peer solution tends to mirror the society and values of the people that it wants to serve.
So now I will talk about the database piece, there’s the OSM.org database, the central source of truth, and this works really well in a centralized environment where the clients ask for sub-sections. So traditional databases operate like a snap shot in time, what is the current state of all of the nodes, ways, and relations in the world right now?
We have a table with these three points, different iDs, these are the points that exist in the world right now, and their state.
But this is fragile, what if we accidentally insert the wrong value? We need to find – we need to find a back-up somewhere, and restore it. What about historic data? What if we change the format of the database, like we write a migration script, and hopefully that is not messed up, you are going to have offline time where the server can’t work because we are doing the migrations. And, of course, it means collaboration is limited to one central source of truth.
So there is this thing called Cappler architecture, that single source of truth is called an append-only log. It is a log of all of data operations ever performed in time.
And it can only be added to – all entries cannot be modified or deleted. So there’s a certain power in being able to say that entry number five is this data, and that will always be true.
So, this is the same dataset, but this this additional column, sequence number. This is the append-only log, first entry, second, and so forth. And we see the iD number 77 and 12 have older values that are different. So the full set of historic values is always captured, and nothing is lost.
But then, how do you get – if you do just want the latest values, how do you do that? From these logs, you create materialized views, which are – it is actually just like the traditional database I showed you before, except the table is materialized from running a function over every entry in the append log.
You could have a lot of material views, for example, taking the latest values from each node, or OSM contributors by region, anything you can run a function for, you can have a log and produce up to-date data.
9/10 travelers agree that Cappler architecture is a great way to go. If you use Git, you are probably familiar with these ideas. This is cheap, if you can write, you can create a new view.
They are incremental, if I process to sequence number five and pull up five new entries, I only need to process those entries and they are updated with the latest data.
And they are disposable. If I change the data format of one of these views, instead of trying to update it in place, I can just delete the view and process the log over again from sequence zero and produce the new view with the new fields.
So that is very durable and cacheable, which are very nice properties for peer to-peer environments.
And what about sync, we have done offline editing, how do they synchronize? Well, there are four steps. The first one is to figure out which log entries each peer is missing from the other, and sending them. I move them to their own local, append-only log, and then updating all the materialized views with that data.
So if I edited a node, and Gregor edited a node, we would both exchange our edits – and you would know which edits are newer, because they have a higher sequence number than the current sequence number you see in the dataset.
A really cool thing about centralized services is that unique identifiers are an easy problem to solve. The server will just say, add one to the last newest iD. In offline environments, there is no central source of truth, so nobody can say what is the next iD for the next OSM element I’m creating. So to do this, we generate unique 64-bit numbers, and fortunately that is a really, really large space of numbers, more than the particles in the universe kind of-big, so there’s a low probability of collision. I say low with an asterisk because it depends on the iDs you are generating. If you are working with identifiers, 64 bit might not be enough. You run into inchances of winning the lottery with collision.
And if you wanted to use 120 bit or larger in P2P, that works.
And a really tricky open question that we have been stumped on are forks. An example of this is one of our monitors, OP, edits a hunting camp offline on his laptop. And Eliza also edits this hunting camp offline, the exact same OSM element. And when they go to sync, what happens? Which hunting camp is the right one? Are they both correct, is one correct?
It is sort of this inherently ambiguous situation, you have an append-only log of elements, and then you have a fork where two people are saying two different things about the data at the same time. If you use Git, there’s conflict resolution mode where the world comes to a halt and you have to figure out how to solve the problem. That is great, if you have a novice and master, but it does not represent the experience.
Right now, we have wins, which of the elements has the newest time stamps, we show that one. And that is a materialized view for OSM P2P server. Most editors these days, like iD, don’t know anything about forks. So by presenting the newest elements in a forking situation, it mostly works, but you see situations where data is lost, or shadowed.
Forking data is natural, ambiguity is natural, we want to embrace the subjectivity of the human experience. There are a lot of solutions, like social ones, me and Gregor agree not to map the same place, the same area on the same day. There are mechanical solutions, if I edit some tags and Gregor edits the longitude/latitude, maybe those are safe to merge without intervention.
And interventions can be things like viewing side by side and choosing one, and there could be get merge resolution options, you could have three merges that pull over and customize it exactly.
And only because the append log guarantees the history is never lost, just because we show new wins does not mean that if we show forks in the future that the data is not lost. So we are excited about finding new solutions to this problem, and making OSM editors aware of forks. Because I think this is a problem for – if traditional OSM clients want to work offline, even if they prefer OSM.org, this is going to be a problem.
Cool. In summary, OSM is super cool. Having no servers is a powerful thing that de-marginalizes communities that wouldn’t have access to the internet. We want to re-use existing OSM text so we can have a symbiotic relationship with them. We model data for append only logs for durability, and we generate iDs to generate numbers in these contexts.
Here are some links, you can check out mypeo.org for Windows and Mac. Cool.
(Applause).
(Speaker far from mic).
Yeah, we will set it up.
Questions? All right.
We have a break coming up, so go ahead and grab some water, we will start back up at 3:15.
Live captioning by Lindsay @stoker_lindsay at White Coat Captioning @whitecoatcapx. Host: If any of my 3:15 speakers are here, if they could come up so we can have a chance to get together and talk.