OSHDB Version 1.0 Has Arrived

Featured Photo: Ohsome dashboard interface for Heidelberg, Germany.

In the words of Confucius, “The man who moves a mountain begins by carrying away small stones.” As we release OSHDB (OpenStreetMap History Database) Version 1.0, we look back at versions 0.5, 0.6, 0.7, and all the other small improvements to our historical OpenStreetMap database as the small stones allowing us to move the mountain.

The mountain itself was identified by our Scientific and Managing Director Prof. Dr. Alexander Zipf back in 2010. According to HeiGIT’s current product owner for the ohsome and big data team Benjamin Herfort, researchers at the time faced significant hurdles if they intended to study OSM data over time. While it was fairly straightforward to analyze contemporary developments in OSM, creating a framework for looking at the evolution of data proved complicated. “Every researcher had to find their own setup and their own way of crunching OSM data. That’s what we wanted to change.”

Prof. Zipf led the team in setting up a server so that researchers could probe historic OSM data without worrying about any setup and processes besides their study focus. With the release of OSHDB, users of historic OSM data were no longer required to be computer scientists, database engineers, or familiar with running a cluster. OSHDB would do that job for them through an intuitive API.

Since then, the changes to OSHDB have been small and meaningful, focusing on creating proper software through improved internal documentation and testing. Instead of adding any crazy features, releases have concentrated on well-functioning software to enable accessible analysis. Now, after five years of development and testing, the team is releasing OSHDB Version 1.0.

This version continues to allow users to visualize and explore the amount of data and contributions to OSM over time starting from the beginning of OSM itself. Not only are the features in the OSM History Data interesting in and of themselves, ranging from country borders to buildings and turn restrictions, but they also facilitate the investigation of data quality, regional quality comparisons, and allow for the computation of aggregated data statistics.

One of the most important aspects of the database is its usability. Two clicks on the ohsome dashboard allow any researcher, journalist, or citizen scientist to view the evolution of OSM data over time for any region. This temporal change can inform us about data quality as we evaluate saturation and check for currentness. In the case of Heidelberg, for example, the number of buildings has not significantly increased since 2012, at which point we may say that Heidelberg became saturated and data in the area is likely of a high quality. To draw such conclusions, we would also like to check for currentness, looking at when entities were changed or added. For more information on this application, make sure to read our blog post on the topic.

Graphs: Saturation Indicators for Heidelberg, Germany over time using OSHDB.

With our simple API constructed for intuitive use with a wide range of analysis queries in mind, users can work with a lossless dataset including deletions of past objects, erroneous, and partially incomplete data. Data can be viewed as snapshots at specific points in time or as a full history of the region, tag, or entity type to allow for endless use cases and to meet all user needs.

The full list of changes is available on github, with most improvements occurring in the “boring bits” that users will only notice through a smoother analysis process. We would like to highlight one change, however, that will prove useful to many researchers. New OSHDB filters allow practitioners to filter entities by the shape of the geometry. This feature targets a common discussion in the OSM community: how to deal with imperfectly-mapped objects.

When some beginners add objects to OSM, they may make minor mistakes in adding new buildings by drawing edges and not properly aligning rectangular buildings. These small errors can contribute to data quality considerations, a major application of OSHDB. With Version 1.0, users can now filter by rectangular or not-perfectly-rectangular objects and can thus identify distorted shaped in OSM. Filter methods like this addition allow researchers to comment on the quality of their OSM data for a region over time.

This filter method along with the many behind-the-scenes changes in this version and over the evolution of OSHDB contribute to the accessible database that allows researchers to easily interpret data in a way that was not possible in 2010, when OSHDB was only a vision in the minds of our team members. With the release of Version 1.0, we’re proud to offer a simple tool for the important task of data quality analysis. We look forward to carrying away many more small stones in the months and years to come.

Video: Evolution of OSM road network mapping in Heidelberg, Germany since 2007.

In the next few days, we will offer a range of content to celebrate and expand upon this release including demos, features, use cases, and insights into the development process. Keep up with us through our blogs and social media channels!




, ,