Humans and Robots Making Maps Together – Transcription
Another talk about machines and robots. It is my honor to introduce Ian Schuler, CEO of Development Seed. Ian is an evil genius and an icon. He also organizes conferences as well, so we had sat in some earlier this year, and there is a rumor that there will be another SATSUM and State of the Map US. So Ian is also on cold medicine, so this should be fun. The really strong stuff. In Colorado, we have other strong stuff for medication.
Yes, I’m already in trouble with the hosts.
No, Sean screwed up the windows here.
I do want to say, Development Seed had a bit of a role in helping to organize the State of the Map, and mostly Laura and Dillon and a few others. I added zero value to the organization, but I did get to lurk on the Slack channel channels. It was so incredibly impressive seeing the – between the OSM US group, and the host community, how dedicated they have been in the last six months to make it happen. And Kevin and Ian and Alyssa and others, making it happen, it is impressive to see what they put into this and so exciting to see you all here. I’m really happy to be here.
I will not talk about anything that we are doing right now, if you were last year, you might have seen a representative from Development Seed where we are taking Mapbox data and using that to predict roads. That is online, Skynet, check out if you are interested. The thing I want to draw for that for context, the hardest part of machine learning, we have the imagery figured out and the compute figured out. The hardest part is getting the training data, and getting those labels.
And for most of the world, that data doesn’t exist. You don’t have government entities with 20 years of really clean records. So, for most training – most machine learning that is going on in the developing world, you are providing the training data. That is coming from OSM, that is something we rely on heavily and others do as well. So for you all, whether or not you are doing the machine learning, you should know that you are all involved in this project. That’s what I wanted to highlight from there.
So humans and robots, so rather than – what I want to do here is sort of look towards the future. I think that there has been a really healthy and vibrant debate within the community about what is an appropriate role for these techniques within the OSM community, and how do we make sure that we preserve the important things about our community, and the human beings that build our community, and while also integrating some of these tools. That’s what I want to approach today.
And by way of prologue, humans, yeah, they are pretty important part of this equation, they have done some really impressive stuff. Drishtie did a much better job talking about this than I did, I will keep it brief. We recognize that OSM is a very exciting point in its development, it is a bit of a turning point. And we mapped a lot and, as a result of our efforts, OSM data is now part of the infrastructure for a lot of efforts. It is part of the infrastructure for groups like Facebook and Digital Globe and Apple and others. It is part of what they rely on, it is part of the infrastructure for groups like the Red Cross, and World Bank, and other people who are doing development work, for governments. People are using and relying on that, and that – it is a lot of responsibility for folks like us. But it is really important.
And the other thing has happened recently is efforts like missing maps have expanded the reach of OSM into areas, into a lot of places that it hadn’t been before. You will notice from this map, and even better from Dhrishtie’s map, is there’s lot of places that are missing, and hers is better. And some of these are in places like China, Pakistan, and some of the Former Soviet Union, and countries – and India – countries that range from a situation where citizen mapping is in a legal gray zone, to countries where citizen mapping is abtually prohibited by law, and people who map on the ground are doing so under great risk.
So, that’s another aspect of this that I think is worth keeping in mind when we think about how is the best way to collect data for those areas.
Yeah, and the other thing that I would say, this is still a hard job, and there’s a lot of work left to do.
It – it is foolish to bet against us, to bet against OSM, but it is also just as foolish to assume that it is a given thing.
And I think that, as we get to parts of the world that are harder to reach, that is where we need to get more creative about how map that and map those parts of the world.
So the question is, do we care? Should we map at all? Are there reasons why we should endeavor to get every single thing on the map, or is it enough to map in the areas where we have the community, and have the reach? I think there’s a number of reasons why we should.
One is the network effect. Our data is more valuable if other people’s data is there as well.
Ian’s house – Iansopenhousemap.org is not a useful site for a lot of things. People do not just care about the place they are in, but places they want to go to and where they are connected to, and how they compare to other places. So part of the reason is their own selfish interests, our data has more value if we have other data in there as well.
The other is that we need this, the places that are the most vulnerable in the world are places that most require this sort of information to plan better, to recover from disasters. And not just recover from disasters, but also to better prepare, to have more resilient cities, to have companies that can build the next transport app on top of open data. These are all things that are critically important, and why we need this data.
And I think the other is that, from a moral standpoint, from an ethical standpoint, everybody deserves to be on the map. Everybody deserves to be on the map, whether or not they are in a position to put themselves on the map today.
And so, while we absolutely should continue our efforts to build this community, we absolutely should be training people on how to map, and how to map their communities. And we also can’t wait for that effort in order to expand the map across the world.
So what I think we should be thinking about is an overall approach where we are empowering this community that we are developing to be better at doing mapping, to be more effective in the ways that they map, and to be doing it a lot faster and more accurately.
And so, where do robots come in on that?
So, there’s some things that robots are better at than humans. There are, well, I will get to that in a second. There are some things that they are able to see that humans can’t. For instance, this image of a power line is really, really hard for humans to see.
But actually, it is very good for robots to see, because they are looking at every pixel and they distinguish thing.
And they look at data that is hard for human beings to interpret, we are pulling it out using Star imagery, it is terrible for a human being to look at, and even the process, it is terrible for a human being to look at. But computers don’t care, pixels are pixels. So there are certain sets of data that are hard to map on the ground, or without having some help. Mangroves are another one, it is hard to go with your GPS and map the extent of a mangrove region.
So that is one area. And what I’m more interested in talking about today is where robots, computers, and machine learning can aid in computer effort and make our work better. And these are a few things that come off hand, one is targeting attention to areas that people need it most, in a disaster situation. If a robot can do a pass at the data and get an idea of what is the most effective area, undermapped area, or where buildings are not there, then they can target the attention to where, particularly in a post-disaster situation, they can make the most difference most quickly. They can help in providing feedback, and on-boarding people, and training people, providing quality assurance. They can help in directing users that appropriate for this area of skills, is this an easy area to road map, or difficult, which is an idea that Blake had at the hot summit.
There’s boring stuff that is good for computers to do that tires people easily. If you can take the cognitive load off and allow humans to do the parts they are good at, that is another way you can make it more interesting and keep it – and you people for to be more effective and to do more of what they are better at, and to help them work faster.
So, the – I wanted to present a rough strawman of how this could look in practice. This is something that we developed at the Hot Summit, and we had some really good feedback from people there. Some of you may have already seen this, and I think it is a good strawman workflow of how machine learning is integrated into the existing workflow for Hot or OSM.
You could imagine where somebody setting up a hot task can put a scan of an area and pull out contextual information that allows mappers to better plan their time, or better select areas to map.
So in setting up an area of interest, you can do that scan, and then in the grid that comes back, you could provide some contextual layer, and that might be focus on here, because it looks like there’s a lot of damage here, or that could be, oh, you are a beginner, these look like easier squares for you to engage in, you are an expert, maybe you are focused on this part of highly-dense urban area that is actually really difficult to map.
Those are the sorts of things that a computer would be really good at.
And another, when somebody selects an area, you can imagine a situation where a machine learning process did a quick scan to try to identify features and made suggestions on where it thought these were. If the computer did a good job, this is a cleaning effort and comes from human validation rather than drawing from scratch. We need to allow people to get rid of those, if they were not accurate, if the computer was very off. And we think that, in a lot of cases, we have seen that this can be a good way to save some time.
It also means that we need different tools to do mapping. I know that Facebook said that they are working on other versions of iD that are for cleaning the outputs of a computer vision process, that is something we have put work into. There’s a tool, Scrub, open source, on GitHub. And that is a very early-stage version, very stripped-down version of iD, or alternative to iD, that is focused on cleaning the messy – the common sorts of messiness in road prediction.
And Dhristie mentioned OSM, and they mentioned certain things that are problematic. Wouldn’t it be great if we had a process that was learning from, as the community develops, what are new ways that people are making mistakes and what are new things that people are looking for to highlight those, so the limited people that are available to do that verification on OSM are putting their attention where it is most likely to be useful.
So how do we do this, and move forward on this, what are the principles we should espouse in trying to get this right?
One is humans first, this is about taking the existing community and empowering it to be more effective. We should be – nobody should be under the assumption that robots are going to replace humans, we are helping humans be better, human needs should rule the day, and with visibility, that is something that we should give priority today.
And open and auditable algorithms, particularly when we are talking about building things into existing workflows, it is important that the math is open, auditable, so people can trust and repeat it and improve it.
If you are not immersed in machine learning, there’s a number of parts of this process. Beginning training data, that involves imagery and label data, but the truth on the ground, and all of that is the input data, and then there’s the framework for turning the data into an equation, and the output is a set of math, a model, telling the computer how to identify roads or buildings.
The piece that I’m focused on is the math at the end. I think it would be great if the other parts are open, I think there’s a lot of value to that. But for the purposes of – the argument that I’m really making here is that the last piece, the model, the math behind it, should really be open if you are integrating into workflows.
If you are producing high-quality data, you can get this into OSM and Facebook found a successful approach to doing that. But again, for workflow, and I would be happy to hear discussion on that.
And OSM is not a bunch of people sitting around, it is JFDI, just Fucking Do It, community of doing things at Dev Seed. What I would love to see coming out of this is a consensus of people who are interested, what are the next steps and things that are most important. I would love for us to come out this weekend with a road map of where we are going in the community so we can put forward the resources from different companies and NGOs that are involved, and the thing that is hitting back is that clear vision.
And the other part of leadership that is beginning, Sean mentioned in the beginning of this talk, OSM Foundation. So Kevin talked about in the beginning about how important it is to be a part of OSM US, absolutely agree, $20 a month, the best $20 you will spend, sorry, $20 a year.
And you can support this community and all the great work that they have been doing. And an at the higher level, the OSM foundation, they maintain the infrastructure in the leadership realm and how that is deployed. Infrastructure is part of how this unfolds, and contributing to that as well is really important to make sure that we have the resources to do this, but also to – contributing that, your voice to that, is just as important. And more so than the $20 that is involved is being involved in voting, being involved in meetings and having a real voice in how this develops, that is going to allow us to run fast.
So, very much encourage you to take up, to mind McKell. Is he still here? Ah, McKell, or find Dale, if you want to join the OSM Foundation. All right. That’s all I have.
All right, thank you. (Applause).
Don’t go too far, okay. Jennings, thank you. Traveling mic! All right, we have two mics now for questions.
(Speaker far from mic).
We can’t hear you, maybe I’m the only traveling.
This is Jennings. What was my question now? So my question is, as you put out this machine learning data into OpenStreetMap into areas that are not mapped, what will that do to the local community? There’s the impressionthat if you put this data in there, it will stymie local community growth. That’s a big concern to a lot of us. Yeah. We would like to hear thoughts on the approach, how you solve that. That is a great question, and one that Dhristie will have better answers than I do. As she comes up to answer that, one thing I would say, I was not discouraged from – I didn’t get into OSM at the beginning and by the time I did, DC was largely mapped. The one thing that she were learning cannot do is provide, is to tell you whose store is this, what do they sell, the times they are open, and all the other data on the ground that is meaningful to people. It provides other valuable information, but there is so much more that peopleen that community want to say about the places that they live and go to, and I’m – I am less concerned about that. It is something that needs to be handled in a smart way, and I think that conversations like this, and that focus on humans, is a part of that. It isn’t just people in lab coats in a basement somewhere, figuring this all out. It is people who are here, in this room. So I – I recognize that it is an important thing. I don’t know if you have anything you want to add on that. Yeah, I think that’s a good question, we got that a lot in the beginning. And my biggest takeaway was, working with the Red Cross, we did a lot of manual mapping, going to another country and walking door to door. Even though we have remote tracing, it took a couple months to do a small city. The world we are talking about is remote, no power, no WiFi, you are traveling to places that local mappers, or the local people in those areas don’t always have phones. And so, those are kind of the areas that everyone is looking to map, especially at scale. And the other thing is I don’t know how many people have gone to a mapathon, but drawing roads is not the easiest thing, not something that people are going to want to do. And to give transparency, even within machine learning, looking at India, if we did it with machine learning, it would take us to 2025. This is one country. So the scale of amount of mapping that needs to be dern is so much, if we put road geometry, it will take years and years to put the name of the road, how many lanes, and what quality is it, and how can it take a truck on it. So the more that we think about those parts of the world, we will need local and we cannot do it without local map. It is just not possible, without that understanding. But there is just a lot to do, there is really no shortage of work.
Yeah. At least the way I have seen it. And, looking at your slides, you had a team of, what, 24 people who are doing the actual validation? I think that’s the thing, moreso than, let’s not – we will draw up machine learning, how do we make sure that the people that are doing that validation are people in the local communities as much as possible and that they are being involved in this loop, they are being involved in this process, they understand it because the key parts are open and available to them. Like, how do we make sure that that is involving them as much as possible? And I think that some groups are doing a better job than others, and the work that Facebook is doing and involving the local communities is a good example of getting that right.
A couple questions back there. So I really appreciate your idea of sharing the model, making the model making openly available. But what is your vision for dealing with the fact that deep learning output, and models that are completely – they are as uninterpretable as any machine learning model could be? How useful would that be to the community? Um, it is a good question, and I think that some of it is open source in general is hard to work with, open source code. But the importance of having it open is knowing, one, it is the – so some of it is the ability to repeat that thing, even if you don’t understand it. And, like, the license and the rights to do so, which is really important for people who are doing more research-oriented work, to be able to know the things that they’re working on today, they are going to be able did he tell it in three more years, and get similar results. And I think that, for the – one of the things that has enabled this explosion, the reason we are talking about machine learning today is that it computes better now than three years ago. And that allows people to get into the space at an amateur level. And I think – it is an area that needs its own infrastructure in terms of having good models, and having good training data.
The Spacenet challenge is a really good example of how OSM and satellite imagery have been put together in a way to enable that, there is so much more that could be done. I get the – I get the question, I do – to be fer pectly honest, I feel like it is a red herring, there’s a lot of value to having this be open that it is – that that problem is something that should be addressed, but it shouldn’t prevent making it open to begin with. Thank you.
One more question, and then it is coffee time.
Does that one work? Hi, so going back to that idea of so much work, and still needing to be done by having qualities of what is this store, but how do you inspire people that have never been part of OSM come to a mapathon, if they don’t get to draw pictures? From my perspective, I think the editing is a little bit more exciting if you are visually contributing to the map instead of just editing attributes. So if you have a message to inspire for that, that would be great. I’m not the best person to come up with that message, I think there are other groups that are much more – have been involved in putting together mapathons and would have a better answer. The thing I would say is, in early – my recollection of early days of OSM is that a lot of it was point-based anyway, it was not drawing outlines as much as putting the things on the map. And I think that that translates, it still translates where you have pre-drawn boundaries. I don’t – I don’t know.
Maybe there is – I don’t know. I mean, I think that it is definitely something that there should be a conversation about. I guess – if the implication is that means nobody is going to map anymore, there are ways that we can use these tools to make it more fun and rewarding, that what I’m excited about. And I definitely want to make sure that we get that right and we are taking that into consideration, and not stripping away the things that actually are fun. One final round of applause for all of our speakers. This was a great session. (Applause). So coffee and snacks, what’s that? We are pushing the next session back five minutes, we have 10 or 15 – five minutes or so for break. We will get back on schedule, it is important to do that because we all want to eat lunch at the scheduled time. So get caffeinated, grab some food, and head to the next section.