The OpenStreetMap History Database (OSHDB) is the main data backend developed at HeiGIT for the ohsome OSM history analytics platform, that will make OSM’s history data more accessible for further analysis. OpenStreetMap (OSM) is a rich resource of freely available geographic information. However, the possibilities for analyzing OSM data on a global scale are limited because of the large amount of resources needed and the lack of easy to use analytics software. That is, OSM’s huge information treasure for researchers, journalists, community members and other interested people is kept hidden. The central idea of the OSHDB is to make this treasure available for a larger public and to develop further analysis functionality, e.g. for intrinsic data quality analytics. We achieve this by employing big data technology that we taylored to the specific needs of OSM’s history data.
In this blog post we briefly introduce the OSHDB API that provides an interface to the OSHDB in the Java programming language. The basic principle of the API is to
(a) select the data you are interested in and
(b) define functions that will compute the desired results from the selected data.
In order to speed up your analyses, the computation can be parallelized on a compute cluster based on the MapReduce programming model.
Let’s look at some Java code to see how easy it is to deploy an analysis:
The first step is to establish a connection to an actual OSHDB database. Several database backends are already available, in this example we use the H2 backend.
OSHDBDatabase oshdb = new OSHDBH2("path/to/osm-history-extract.oshdb");
Next, we declare a MapReduce-job and link it to the OSHDB:
MapReducer<OSMEntitySnapshot> mapReducer = OSMEntitySnapshotView.on(oshdb);
In this example, we use a snapshot view that enables us to take snapshots of the OSM data history at given points in time.
Now, we can define our computation. Let’s sum up the total length of mapped motorways in the bounding box of the Maldives at monthly snapshots for the year 2014. As explained above in (a), we first select the relevant data.
OSHDBBoundingBox boundingBox = Country.getBoundingBox("Maldives"); mapReducer = mapReducer.areaOfInterest(boundingBox) .timestamps("2014-01-01", "2015-01-01", OSHDBTimestamps.Interval.MONTHLY) .osmTypes(OSMType.WAY) .where("highway", "motorway");
We do so by providing the Maldives’ bounding box as area of interest, the time range from 2014-01-01 to 2015-01-01 in monthly intervals as timestamps and by filtering for OSM ways tagged highway=motorway.
Finally, we define the functions mentioned above in (b).
SortedMap<OSHDBTimestamp,Number> result = mapReducer.aggregateByTimestamp() .map((OSMEntitySnapshot t) -> Geo.lengthOf(t.getGeometry()) .sum();
The map-step is provided as a lambda expression that takes a snapshot of an OSM entity and returns the length of its geometry. The reduce-step computes the sum of these lengths. Because we told the MapReduce-job to aggregate by timestamp, this reduction is performed for each of our monthly snapshots separately. Therefore, the result is a sorted map that holds the total length of highways as a number for each timestamp.
Stay tuned for further updates of OSHDB and the ohsome OSM-history-analytics platform, such as the forthcoming ohsome API, implemented as REST interfaces, to interactively answer and visualize common predefined research and analyses questions.
First results of applying OSHDB and ohsome can be found in:
Auer, M.;M. Eckle; S. Fendrich; F. Kowatsch; S. Marx; M. Raifer; M. Schott; R. Troilo; A. Zipf (2018 accepted): Comprehensive OpenStreetMap History Data Analyses – for and with the OSM community. State of the Map Academic Track. Milan.