Planimetrics Import: Denver Metro Area – Transcription
Thanks, everyone. Up next, we have Mike Thompson and Ashley Summers, I was told that they would leave lots of room for Q&A, which I’m excited about.
Ashley Summers: Good morning, I’m with Mike Thompson, and we are talking about a project we have been working on with our colleagues, we are really excited about it. I will tell you about the organization I’m from, I work for the Denver regional counsel of governments, we bring local governments around the Denver metro area to plan for the future. So we help these local communities to get together and collaborate and brainstorm to think about urban development and transportation into the future to make sure that we are planning for the best quality of life in the region, or people who are here now and later.
As I said, we serve a nine-county area around the Denver/metro region. So this is the area that we are paying attention to. And, in particular, my team at DRCOG is a team that manages analysts and developers to understand the environment in this area so we can forecast with the built environment needs to be in the future to maintain the quality of life that we have.
And one way we do that is by facilitating really large data projects. So, for example, for the last 15 years, we have had an aerial imagery project, where DRCOG brings together 50 different partners in the region to purchase high-resolution imagery, like 3 and 6 inch resolution that we can look throughout this 50-square mile area to understand change over time.
This model was so successful and we were able to provide such valuable data to so many partners with significant cost savings for them because we are doing a bulk buy, why not do this with other data sets?
So a few years ago, we started thinking about what we can add on. And we did planned metric features. They are anything that you can see from imagery and delineate. These are features of the built environment: Outlines of buildings, sidewalk lines, parking lots. We wanted to identify these from our imagery, and we decided to do 1100 square miles around the Denver metro. And we are digitizing this off of 2014 aerial imagery. Just for fun, how many buildings do you think are in that purple blob? That is very, very close. A little bit lower, but very good. And – go ahead. It is a good guess. There is a little over 600,000 buildings in this area. And we digitized them all. So, to tell you a little bit about our methodology, DRCOG, we’re a small shop. Our data development team is four people. So what we’re doing with these giant data development projects is paying a vendor. We are facilitating all of these different partners from our community, bringing together their needs and also their money, and then we go out and we find a vendor that can do these things for us. And so, we have a vendor that had captured three inch and six inch resolution imagery in this area, and then – there we go. And then we paid them to manually draw in all of these features over 1100 square miles. And this includes building roof prints, you can see the level of detail we captured, things like parking lots, sidewalks, and what I mentioned earlier. All of this is with partners that contributed funding. After we got this in-house, we put it up on the web for our partners and those in the community validated it. So we feel confident that the data is quality. When we originally went about this project, we thought this is great for municipal mapping, it provides contexts and helps master plans in community development, focuses on sidewalk availability and things like that. When we realized how high-quality the data was, we realized there is more utility for building models and there are so many emerging uses because of the level of detail. So one of the more novel things about this program, not that we were just able to leverage all of these community needs and find a way to actually buy this data, but we actually, before we got started, sat down with all the partners and said: How about we not sell this. How about we give this away? So we did. After we finished, there was no waiting time. There was no plan to recoup costs off of public data. We’re public agencies, this was purchased with taxpayer dollars. As soon as this was in-house, we put it online and let people tag it. (Applause). Woo hoo! We are very proud of that, that was not easy. We wanted to go a step further, not everybody in the world goes to the DRCOG website, sadly. And we know this. And we want to find a way to provide more value. There’s a lot of value here, we want to make sure that it is out into the community. And so our thought was, we should give this to OSM. And we thought that, not only could it help us show some visibility to what we’re doing, but create this partnership with people that we know can provide value back to us and be a great relationship. But we really wanted to see this data show up on the map. So we reached out to OSM volunteers as soon as our first project was over, which was last summer, and then we started learning about the guidelines for bulk imports and what that means, what kind of documentation should we do, what kind of community discussion do we need to engage in? We started a Wiki page, and we decided, even though we collected nine features as part of our metric program, we were going to do building foot prints first, because that’s a valuable asset that we think more of you need. So that was the pilot we worked with, and then we spent the next year trying to figure out what would it mean to get 600,000 buildings into OSM in a way values the existing OSM product. To talk more in detail about that, I will pass it to Mike.
Thanks, Ashley. A great thing about OSM in general, and this partnership in particular, is the relationships that have been established. I don’t think we had met in-person before this morning. I was saying, before I presented, I don’t know if my co-presenters are here or not. It is a wonderful partnership. And just because I’m up here speaking doesn’t mean I did the bulk of the work. We had a lot of other people on the team. In particular, Chad Hammond. Is Chad here? So a lot of this stuff that I’m talking about, Chad did the technical heavy-lifting, Russ, Jim, and Curtis Brown. This is a work in progress, we want your feedback from the Q&A session. This is our first import, as OSM volunteers. So any how-to, in addition that you need to fix something, would be greatly appreciated. If you don’t have time in the Q&A, catch us afterwards. And the general process was, we got the DRCOG data, we conflated it into other open data, and some address points. We got the message that the nearest building may not match, and we got feedback from other OSM communities on how we can do that better. And we made a distinction between garages and sheds, which wasn’t part of the DRCOG. And here’s the way that we map the building type tag. These are different schemas, and different ways of classifying things. And then you can see the relative – that matches up to the number that Ashley demonstrated. Here’s how we mapped the attributes in the GIS data from DRCOG to the OSM tagging. We, Chad, put the data into postgress, and sliced the data into smaller chunks, if we will. And some magic happens to ensure that the data only appeared in one chunk. Buildings were not split if they straddled that. I think it is random as to which they straddled, if it is on the north or south, but there is only one, that is an important point. And multi-part buildings, we ensured that all the parts only resided in one grid cell. So, speaking of multi-part buildings, this is a tiny thing that I did. And in the DRCOG data, the schema is flat. We had a multi-story, not a multi-story, but multi-levels. We had a hotel, a lobby, and all the rooms are in the two-story tower. Those are two separate pieces of geometry, two separate records. They would share a building ID. And the proper way in OSM is to use a relation. So I wrote a Python script, I used a piece of open source software, called Shapely. I see some people smiling that have also used Shapely. A great piece of software, maintained by a guy in Fort Collins, Colorado, Shawn who I don’t think is here today. And I grouped all the buildings that have the same iD, layer relation, created an outline, shaded, and put those all together to make the outline. And then the outline layer is shown there, and the parts, the original buildings in the DRCOG data became parts and members in that relation. So we base our process, the process that we’re going to ask additional volunteers to go through what Los Angeles did in their building imports – (speaker far from mic). So we have custom instance and tasking manager, that lives on Chad’s GitHub account, or his AWS account. The basic process, and we will ask for volunteers to download through the tasking manager in OSM data is one kilometer square, and the – (speaker far from mic) – DRCOG data goes through, if there’s a conflict, if the building exists in both, then we will use the replace geometry function and transfer, or conflate, or combine the attributes. And the two layers, the OSM layer, the DRCOG layer, these are merged in Jaws and uploaded. If there are conflicts that cannot be resolved through satellite imagery, keep track of those and we will do field survey work. And one of the things, we need to get the converted data, the one kilometer squares in DRCOG data out to you, the community, to verify. This is where we’re at right now. We are seeking input, and then we will move forward with that input. Questions, comments, suggestions?
Hello. A quick question, I noticed you guys actually had building iDs that were associated with these, like, an area with a bunch of buildings. I’m curious about what your plan is, in terms of maintaining – maybe over time, you have attributes of the different buildings you changed, and the buildings you grow, and the buildings that are demolished. I’m curious how you track the lineage between the data you have through OpenStreetMap, between the data from the government.
Good question, one reason there is not data available in the tasking manager is because, since the original project, we have an update problem. We have some momentum from DRCOG and his partners to do updates. We don’t re-digitize every building, the fabric and the structure is laid and we digitize the changes, and making geometry changes that are out there. So, we’re not really sure. I think that we’re going to find out a lot, as we go through this initial process, to have volunteers start checking the data in and give us feedback on what the conflation is like, and how we might need to break up the data even further to check in buildings over time. So I think we’re open to some feedback on that particular process. Over the past year, we have spent most of our time thinking about how we were going to do it the first time.
I recall that you told us on the call that your vendor would give three separate deliverables.
Yes, that is correct. We are tracking the building iD here, and our vendor has just delivered the next set for us. We have one dataset that is this original from 2014, we have a dataset that shows everything on the grounds in 2016, and we have a change dataset that highlights everything that has happened: So building demoes, new buildings, and attribution changes. Somehow, we will work all of that out.
A couple suggestions: I don’t know if you had the same problem that I did when we did the Seattle import. We had a lot of address notes that did not fall inside the building polygon. I found out that if I used the parcel data to see which buildings are in there, I can put the address node inside the building and attach it to the building, which includes the detail much better. And then the last thing, my good friend Paul McLellan over there, he is an address expert. And in his county, King County, they have a unique address iD that stays there. You can put that in there, and then if the address changes, you can get that data and update your address data pretty easily.
Yes, good point on the addresses that may not pee on the building, or even near the – (speaker far from mic) – we are looking into that.
We did a similar thing in Grand Junction Colorado earlier this year. We would be happy to share our experience in a smaller scale about how that happened and the process we went through. And we also have a custom tasking manager, if you need more information on the data. I just wanted to offer that.
Yeah, let’s talk afterward.
Hi. My question is, will JOSM replacing the geometry operation, is there a tagging strategy that occurs?
Yes, according to Cliff, yes. The primary purpose though is you want to preserve the history, we will move the building here in Denver a few years ago, we don’t want to wipe out that history and still say you are the original person that created it, and we are just version two. So we use the ways, the nodes, and as many nodes as it can to make that new geometry.
How does your vendor know the building use?
A really good question. They are interpreting everything from their aerial entry, and they are not local. They don’t know very much, which is why our categories are pretty broad. They can tell industrial from resolution, and commercial and, in some cases, they can tell public data, because they can say, that’s a school because there’s a football field next to it. So they are mainly interpreting from just a visual interpretation of imagery. They do write that down in the dataset they give us. They will let us know if they had additional sources to let them know what the building was. And I would say that you have to take it with a grain of salt, knowing how you are putting that attribution in for that. Yeah, absolutely, overwriting that use with something that is more accurate is a great way to go. And keep in mind that we had our local GIS staff and the staff of our partners validating. There’s an extra set of eyes on it since the vendor gave it to us, but this is a huge area and very detailed data. So all that the value that this community can bring to this data to the map would be fantastic.
I have a question. Having gone through this process, what would you say is the feasibility, if you are working with the private vendor, applying data in a format that matches the OSM scheme?
I think that’s doable. They work for us, right? So we just – we have to ask them to do it. And I think now that we have created this partnership and we know more about what it needs, we can keep that in mind as we assign future contracts. And if this import goes well, we have nine other features, including sidewalk center lines for 900 square lines that we can plug in.
Cool, having said that, well, the first time through this was painful, because you didn’t know, at least I didn’t know, what we were doing – is that license good enough, what does the format look like, do we have to write software, how will we figure it out. So, yeah, if the vendor can write it, great. But I think that we learned that there’s a huge amount going through this.
I’m curious about, you did this at the regional government scale. What are your thoughts on if that’s the right scale, if you scaled this out to other regions, at the regional government scale, lower level, possibly higher level, what is your take-away on that?
I think the regional scales is a good one, just because a lot of agencies like DRCOG have a history of bringing together agencies. That is what it is mandated to do, federally. So we already have a relationship with all of these other communities that we can bring together. So it works out pretty well. I think a little bit smaller than that also works, but maybe it doesn’t have – (speaker far from mic) – and the larger scale is even harder, than at the state level. The more partners you add in, the more needs you are accommodating, the more disparate that you get, and it is harder to get to a successful project. We are at a sweet spot, and I think we were lucky in a lot of different ways to pull this off. Now that we have momentum, we have to keep it going to show that this really is something that is valuable for the community.
We are considering, once we get everything into the tasking and chopping it up into separate projects, it is discouraging to volunteers and the thousands of tasks – so maybe by municipal area, they can say that we’ve got something done. Any other questions? We have 10 whole minutes leeway here. We didn’t talk that fast.
I am curious about the population area, and the designing of your properties – (speaker far from mic).
You are very close, in the metro Denver area, it is 2.4 million, and we are expecting a little over – close to four by 2040.
Any other questions?
How about entry ways? This is for people that have done the import, too. I think the center of the building address is great, and keeping it with the building. How much work has been done on entry ways, and is that way down the line, but certainly for all emergency management datasets, you know, from the state scale, we are interested in entry ways for emergency access? I’m curious about thoughts from the crowd. We know – yeah.
If I’m not mistaken, OSM practice can support that.
(Speaker far from mic).
That would be a field survey process, that would be parallel to this. But probably can’t – (speaker far from mic).
While you are out there getting entry ways, count the units for us, too.
You can count the number of mailboxes in front of the buildings as units.
Perfect, I will do it tonight. So, with entry ways, it is more and more common to add the station entrances for the public transport centers, and represent transit land as working on trying to boost that effort. But, go ahead and add your station entrances, where ever you are in the world, because it will definitely help out with multi-modal routing.
Any other questions or comments?
Just one more comment, that the doors can all be gathered through field observations, we get that a lot in our street-side imagery. We have OSM cars and I think, from what I see, they are currently busy extracting signs, road signs, and all that stuff. But you can also extract doors from that. So I wonder if many of the companies, do they have any plans on doing that? Because this would be one of these examples where it would take a long time for people to complete, and this is where machine learning can really step in and improve the data quality and precision.
In the GIS community, Mappilary reached out to our listserv recently, and the mappilary mapathon is coming up, maybe we can combine this as well and see how far we can push it. I’m actually really excited about it.
Let’s get on the meet-up. Yeah.
The local community does have a meet-up page that heads up OSM Colorado. You are welcome to go and check it out.
You can go now. Thank you so much for listening and having great questions. (Applause).
Live captioning by Lindsay @stoker_lindsay at White Coat Captioning @whitecoatcapxs.