New Contributors and Nepal Earthquake – Transcription

Okay, everybody. We’re going to go ahead and get started so that we can roll through these and then as a reminder, as soon as we finish the last talk, we’re going to very safely go past this had area out to the terrace and get organized for that satellite and photo portrait portrait. So our first speaker is David Parr. He is at Metro State University down in Denver. And he is going to talk to us about using OSM with the Nepal earthquake. Hey, good morning, everyone. Is this on? Good morning, everyone. So, yeah, my name is David Parr. I’m an assistant professor of geospatial sciences. And I teach in a small suburb of Boulder called Denver. And today I want to talk about what happened around the new contributors and the Nepal earthquake in 2015. And really my talk is designed around the issue of, you know, why do we care about data quality of online spatial data? I know there’s been a lot of talks about data quality, but I want to sort of talk about obviously why it’s important in a critical situation, but also how do we measure and why do we care about it? And so, as an educator, I think about data quality a lot because I want to be able to teach my students what they need to know about spatial data quality. And so if you think about it, as consumers, why do we care about the quality of data that we get? As data creators, and a lot of us in the room are data creators. All of us are data creators. How do we ensure and communicate the quality of the data that we have? And as an instructor I really think how to teach data quality to my students to they understand the importance and the aspect. What could possibly go wrong? I don’t if you remember, in the Bosnia conflict, can embassy was bombed and the U.S. blamed it on spatial data quality. It’s important with the quality of the data that we use. So from an academic point of view, we lump geospatial data quality into six characteristics. The lineage of the data, the history of the geographic data set, the accuracy particularly the positional accuracy how closely does the data represent its true position on the ground? And then, of course, attribute accuracy. Is it the thing we say it is? The right characteristics? Logistical consistency. Not from OpenStreetMap, but I once downloaded a database of Starbucks locations. Sometimes it’s Starbucks with a capital us, sometimes a lower-case S. And sometimes two words. And my favorite, Starbucks with dollar signs instead of Ss. Logical consistency. How to be consistent with data. And completeness. Do we have all the data that we need? And finally, the usage, purpose and constraints. What’s the purpose of the data? Who can use it and what are the limitations of the data that we have? And then the last aspect, all geographic data, all spatial data has some sort of temporal quality. An aspect. It was created for a particular time, at a particular time or only good for a particular time. So that temporal quality isn’t something we always think about. But it’s important when we talk about data, particularly in the event of a crisis. Because in the event of a humanitarian crisis, who created the data and when, how long is it good for? It’s of critical importance. So when we get to talk about data quality in OpenStreetMap, the traditional ways that we think about data quality in GIS change quite a bit. So the traditional ways are sort of either using ground data, ground truthing to look at the data, or to compare against authoritative datasets. But a lot of research in OpenStreetMaps has looked at both how accurate is the data to other datasets? How accurate on the ground? But also the changes to do more changes in OpenStreetMaps lead to more accurate data? And the answer is generally yes. The more times that a feature has been changed, the more accurate that feature is on the ground in terms of positional accuracy. That’s a unique way to measure the data quality of OpenStreetMap. Is to think about in general how many changes which impacted the data? How often has that changeset been done so we have an idea how accurate that data might be

And, of course, in a crisis, maintaining reliable and accurate data and giving it to relief workers is of utmost important. It’s a life or death matter. Really what I’m looking at is the 2015 earthquake, the response, and how it prompted a crisis. And then a resolution and OpenStreetMap and humanitarian and the HOT, humanitarian OpenStreetMap team. Measuring the earthquake and the number of mappers and increase in mapping activity. The social and communication processes around the increase to address the problem with the number of increased users. And then whether or not the novice mappers, the new mappers, included novel users, had a net positive or negative impact to the project. And so I’m sure many of you were involved. But a quick background. The Nepal earthquake hit in April of 2015. There were around 8,000 reported dead. There was widespread damage across the country. And in addition to both Kathmandu, also many of the rural outlying areas, there was a huge relief effort that came in immediately from around the world. Both in terms of NGOs, international NGOs and relief agencies and government responses. And not to get too much into the politics of it, but it was exacerbated bit fact that Nepal has been in a constitutional crisis for the previous year before that. So their government was not in a state to be able to respond effectively to the crisis. In addition, Nepal has an amazing OpenStreetMap program called Kathmandu Living Labs. So they really took ownership of being the ontheground agency for working oneonone with the relief agencies. They would print off maps on a daily basis for the Red Cross as well as Doctors Without Borders to give up to date information as it came in and verify information on the ground. So they were critical to this effort. Having that local relief agency already set up and running was pretty spectacular. And so if you look at the response around the world, you see it’s actually a pretty incredible response. So unfortunately this chart is log rhythmic. So that red box is the date of the Nepal earthquake. And the number of people who mapped in Nepal the ten days prior and the ten days after. And what those numbers it’s a little bit hard to read but there was actually a tenfold increase in the amount of mapping activity in terms of the changes in adds and deletes from the ten days after pretty consistently to the ten days before. So a huge response. Even more impressive, if you take all of the people who had mapped in Nepal, had added OpenStreetMap data in the previous ten years of OpenStreetMaps back to the inception of OpenStreetMaps, there were about a thousand more users who joined after the ten days after the earthquake and started mapping. So there were more people who joined after OpenStreetMap after the earthquake and mapped than there were in the previous history of mapping in Nepal. So the amount of response was pretty incredible from around the world. Which is great, except, of course, it also creates a problem in that at the time, the infrastructure wasn’t there to accommodate that widespread amount of new users. And just sort of some of the discussions that were going on, thinking about, well, how do we how do we change what we’re doing and who we are letting in? So there was sort of one category of people who were interested and maybe making sure that we don’t allow new mappers, novice mappers who maybe don’t understand the issues of data quality to map in the event of a crisis. And then, of course, there’s the opposite side of the group. The group that were saying, well, the whole point of OpenStreetMap is to be open, and we need to maintain its openness, but we need to have a way to both balance the open infrastructure and maintain that critical response. And so there was a member who said we have to adapt to an awesome contribution. We need more people, at the same time we have to adapt to various ways to crowd sourcing as we see problems arise. We saw a lot of messages come across the HOT email list with subjects of “Argh”! From frustration. But it was the new user opportunity. And a new user wrote in, I am a newcomer. I’m not a GIS scientist and I learned on the job. However, I think I’m perfectly capable in making a contribution and the OSM process isn’t clear. To talk about more, I’m sure you are aware of how HOT adapted. But one of the first questions I wanted to understand is, did the new contributors have a generally positive or negative impact? So if we actually look at the number of users before and after, users who had more experience, who had joined OpenStreetMap before the earthquake, in general contributed more on average. But so about a little over twice as much on average than new users. Which you might expect. However, the percentage of features that were modified or deleted or changed was slightly higher in the more established users, the users who had joined OpenStreetMaps after than the ones who had joined before. So 20% for users who had joined after sorry, joined before the earthquake. 12%, almost 13% for those who had joined after. Now, that’s not a strict measure of data quality, but it indicates that the problem might have been slightly overblown. Of course, there’s also other issues with positional action in data quality which is that we don’t know how inaccurate the data was. So was it a small change or a large change? But overall there wasn’t a substantive difference in terms of in terms of the process of data quality between older users and newer users. And so yeah. In general a much a much more active response. And so the solution from the humanitarian OpenStreetMap team was both a social and technological change. Socially they created a bunch of new resources, And having experienced reviewers check work. You have the option to request work to be checked in the event of a mapathon. And a technical change of implementing the tasking manager which is now on tasking manager version 3 to provide specific instructions and guidance to users. So it’s been incredibly successful. In the recent hurricanes, we have seen the amount of mapping activity has been much, much faster in terms of response. So really, I think what we see here is a classic example of a crisis. Implementing change in both the technical side and the social side. And in coming up with a new solution. And one of the reasons why I wanted to talk about this today is that there are some very large potential humanitarian threats that could be occurring. And so I think it’s I think maybe it’s time to start thinking about planning for the unthinkable. Not trying to have a doom and gloom scenario, but I think it’s in terms of humanitarian threats today and the world that we live in and the political situation, I think it’s a realistic idea to think about, how do we respond in the number of disasters and the number of threats that we have? And even the scale of threats that we have. Not to end on doom and gloom, on a slightly lighter side. Of course, crisis is opportunity. Crisis is what makes things change and bloom and turn into something new. So I wanted to leave you there with that thought. Don’t be afraid of new contributors. But also that every time you come into a crisis we come up with hopefully a better solution from our community. Thank you. That’s my time. Questions? [ Applause ] We have five minutes for questions. So anyone with questions for David Parr? AUDIENCE: I’m interested in those percentages of modified edits. It was interesting to think that the people there would have 20% change and 12% for the folks afterward. I feel like the denominator on that, how do we know that someone put eyes on all of the features? Or how do you know that you have a comprehensive set of review? Like do you feel like you could be potentially missing things? Or do you feel like those numbers are really comprehensive? In terms of the amount so the question is is the question do I think that I’ve matched found all the reviewed data? What I looked at is data that was created after data that was edited after the earthquake. So I looked for version increases in any data that was either created any data in Nepal that was in the region. AUDIENCE: So I guess the question is, there was so much data created. How do we know there was a secondary set of eyes put on all of that new data? Could there just be a bunch of raw data that hasn’t been edited yet because there hasn’t been sufficient review? Absolutely. The question is, how do we know the data is created. I don’t. I looked at the data ten days before and after the earthquake. So it’s certainly possible that there were longer term changes that were implemented. People went back and reviewed the data as a longer term process. So the answer is, I could go back and compare what data has increased in version number since that tenday period to get a sense of it. Yeah. AUDIENCE: Thank you. AUDIENCE: Hi. So you’re using the increased version number as a proxy for data quality. Right. AUDIENCE: Does that include different contributors increasing the version numbers, or could be the same contributor editing the same object a number of times. Do you have a sense is there some sort of literature that points us in the direction of which one is a better measure of quality or which one are you looking at for both? There is a paper I don’t remember the author off of the top of my head. There is a paper that looks at version number and who created it and when. And their actual theory is that there’s a density metric of improvement. That things in a given area it’s not just version number itself. But version number and lots of changes in the same area. So the idea is that if somebody’s if there’s a group of people or an individual looking at a particular area and they’re going through and making changes, that area is going to have a higher data quality. [ Away from microphone ] Yeah. I didn’t go through and measure the number of users making those individual edits. Yeah. So I don’t know. That’s a good question. That could be a good followup paper. Yeah. That’s good. One more short question? No? Thank you. Okay. Thanks. [ Applause ]