OSHDB - OpenStreetMap History Data Analysis

  • Martin Raifer (Creator)
  • Rafael Troilo (Creator)
  • F.-B. Mocnik (Creator)
  • Moritz Schott (Creator)



A high-performance framework for spatio-temporal data analysis of OpenStreetMap full-history data. Developed by HeiGIT as part of the ohsome project. Code is hosted on github: https://github.com/giscience/oshdb. The OSHDB allows to investigate the evolution of the amount of data and the contributions to the OpenStreetMap project. It combines easy access to the historical OSM data with high querying performance. Use cases of the OSHDB include data quality analysis, computing of aggregated data statistics and OSM data extraction. The main functionality of the OSHDB is explained in the first steps tutorial. OpenStreetMap History Data OpenStreetMap contains a large variety of geographic data, differing widely in scale and feature type. OSM contains everything from single points of interests to whole country borders, from concrete things like buildings up to more abstract concepts such as turn restrictions. OSM also offers metadata about the history and the modifications that are made to the data, which can be analyzed in a multitude of ways. Because of it's size and variety, possibilities of working with OSM history data are limited and there exists a lack of an easy-to-use analysis software. A goal of the OSHDB is to make OSM data more accessible to researchers, data journalists, community members and other interested people. Central Concepts The OSHDB is designed to be appropriate for a large spectrum of potential use cases and is therefore built around the following central ideas and design goals: Lossless Information: The full OSM history data set should be stored and be queryable by the OSHDB, including errorneous or partially incomplete data. Simple, Generic API: Writing queries with the OSHDB should be simple and intuitive, while at the same time flexbile and generic to allow a wide variety of analysis queries. High Performance: The OSM history data set is large and thus requires efficiency in the way the data is stored and in the way it can be accessed and processed. Local and Distributed Deployment: Analysis queries should scale well from data explorations of small regions up to global studies of the complete OSM data set. The OSHDB splits data storage and computations. It is then possible to use the MapReduce programming model to analyse the data in parallel and optionally also on distributed databases. A central idea behind this concept is to bring the code to the data.
Date made available16 Sept 2021

Cite this