Partnerships Lightning Talk, Tim Smith

Partnerships Lightning Talk, Tim Smith – Transcription

Next up is Tim. If Microsoft comes, they really come at the same time. So here is Tim. All right. All in. Hi, everybody, I’m Tim. I’m also a program manager on Bing maps. And this talk is the mappings between the maps. So what is what does that really mean? That means we want to connect the dots. We look at a lot of different what’s that? Oh, no. I’ll give you extra time. Sweet. There. That’s really good. So we work with a whole bunch of different datasets and we really want to be able to line them up. Get the connections in and see where they agree and disagree. So at a really high level, I’ll jump through a really quick example. Here is the city of Boulder in OSM. We want to make kind of a logical map between that and, say, here is the same entity in Who’s on first. Subtle differences, different metadata, but ultimately the same entity. And take it one step further, here’s the Wiki article for that. It might have some useful information. What we don’t want, you know, is an article about large rocks. I like how Wiki decides to point that out. A more legitimate example is we wouldn’t want a county called Boulder. Might even be in the same place. This is the wants and don’t wants for all this matching. So our quick little lightning talk fourstep easy res recipe to get good matches I’ll walk through. We need to bring everything down into kind of a uniform format that’s keeping data that’s relevant to matching and normalizing it so that, you know, everything can talk together. Additionally, there might be some gaps. Not all data sources have the same degrees of richness. So we can actually pull out some things. For the Wiki example, we started crawling through text to find additional information. Hierarchies, types, things that might not be good in the Wiki data blob, which is all too common. Next up we to want figure out what all the best matches are. Put this in a soup and go through a number of features looking at the data from the hierarchy to the geometry to the attributes. That can get complicated when you’re looking at everything in the world. As is the common topic this weekend, machine learn all the things. So we train models to determine the relevant weights to all these scores and determine all these matches. So what can we do with this? We have tons of matches all around the world of all sorts of types. Well, we do a whole lot of things with it. Measurements galore. Our bread and butter right now is comparing and contrasting the different values and things that different data sources have to offer. This almost was a talk of its own. We can look at gaps. We can identify duplicates, concordances and agreements and what they have to say about the world. Data and a whole lot more. How does this differentiate from other similar solutions that other companies have? One is scale, and particularly interesting to me is the interesting signals we have as Microsoft, and particularly as Bing. Looking at how people use search this information. And what’s really relevant? Going into the future, what I would hope for from something like this is kind of a centralized mapping between all these open data projects. There’s a lot of richness out there that, you know, all these projects have their own purposes and their own angles of looking at the whorled. And I think connecting all those dots can really lead to a richer map for everybody. So if there’s questions, feel free to save them for later or there’s an email up there. And thank you for your time. [ Applause ] Thanks, Tim.