Test Case Title |
TC5.10 - Open Street Map ingestion process (OSM2SM) |
Goal |
As administrator, I can Filling the Knowledge Base with a street graph obtained from the Open Street Map. |
Prerequisites |
Installed Linux Installed Osmosis Installed Sparqlify PostgreSQL with PostGIS extension Osmosis simple schema database |
Expected successful result |
The Knowledge Base contains an up-to-date street graph imported from Open Street Map. The following functionalities are available only for specific Snap4city users with specific privileges. |
Steps |
Identify and get the latest version of the Open Street Map extract of your interest. Fill an Osmosis simple schema database reading from the Open Street Map extract. Launch appropriate SQL scripts to prepare the data for an efficient triplification. Launch the Sparqlify configured through an appropriate SML script to generate the triples. Improve the resulting triple files removing the heading and duplicate lines. Load the generated triple files to the Knowledge Base. |
Prerequisites
The process is proved to complete successfully on both the Ubuntu and the Debian Linux distributions. Anyway, it does not rely on distribution-specific features, so feel free to adopt the distribution you prefer.
The Osmosis distribution, documented at https://wiki.openstreetmap.org/wiki/Osmosis, is leveraged to read from the Open Street Map source files and write their content to a properly shaped relational database.
The Sparqlify generates the RDF triples reading from a relational database, based on a SML configuration file. Documentation and installation artefacts and instructions can be found in the main page of the project at http://aksw.org/Projects/Sparqlify.html, and on GitHub at https://github.com/SmartDataAnalytics/Sparqlify.
PostgreSQL is the recommended relational database engine for storing the Open Street Map data. Documentation and installation artefacts can be found at https://www.postgresql.org/.
PostGIS is an extension that allows an efficient and effective management of geometric and geographic data in PostgreSQL, and it is required to be installed for effectively working with the Open Street Map data. Documentation and installation artefacts can be found at https://postgis.net/.
The relational database schema that we have identified as the most appropriate is the Osmosis simple schema. The SQL scripts for the shaping of the database are part of the Osmosis. Detailed instructions can be found at https://wiki.openstreetmap.org/wiki/Osmosis/Detailed_Usage_0.43#PostGIS_Tasks_.28Simple_Schema.29.
Identify and get the latest version of the Open Street Map extract of your interest
We have identified the Geofabrik as the best source of Open Street Map extracts. It is a specialized portal where Open Street Map source files (and their compressed versions) can be found that address specific continents, countries, and regions. Also, the extracts of the OSC files are available, where the variations that occur to the Open Street Map are represented on a daily basis. You can get the Geofabrik Open Street Map extracts from download section of the portal, that can be reached at https://download.geofabrik.de/.
Fill an Osmosis simple schema database reading from the Open Street Map extract
The Osmosis distribution includes a command-line tool and a set of scripts that allow to perform a wide set of manipulations over the Open Street Map data, such as extractions, transformations, comparisons, and so on. In this use case, the tool is leveraged to fill a properly shaped relational database with the data extracted from the Open Street Map source files. Detailed instructions of how it could be performed can be found at https://wiki.openstreetmap.org/wiki/Osmosis/Detailed_Usage_0.46#--write-pgsimp_.28--ws.29.
Launch appropriate SQL scripts to prepare the data for an efficient triplification
After that the relational database has been filled with the Open Street Map data, a set of auxiliary tables must be created to speed up the following steps of the ingestion process. The creation of such tables is performed through the execution of a SQL script that can be found in our (DISIT Lab) GitHub repository at https://github.com/disit/osm2km4c/blob/master/sparqlify/install/performance_optimization.sql. The script is typically needed to be executed only once for each database, since it accesses data that tend to be immutable over the time, such as the listing of the Public Administrations that govern the territories of interest, their borders, and some country-dependent configurations.
After that, and every time that the Open Street Map data are updated in the relational database, some other optimizations are necessary. Such optimizations are to be performed executing the SQL script that can be found at https://github.com/disit/osm2km4c/blob/master/sparqlify/install/irdbcmap.sql.txt, in our (DISIT Lab) GitHub repository. At the beginning of such SQL script, a configuration section can be found. We reasonably expect that the only parameter of your real interest locates at line 25, and it is the Open Street Map unique identifier of the geographic boundary of the triplification. Indeed, even if your relational database contains, suppose, the Open Street Map data of a whole country, we do not recommend producing the triples for the whole country in a single execution. Indeed, we recommend producing the triples for one province at a time. This way, you are granted each execution to complete in a reasonable time.
Figure: The Open Street Map unique identifier for the Municipality of Helsinki is 34914
Launch the Sparqlify configured through an appropriate SML script to generate the triples
After that the data preparing outlined above has been performed, the Sparqlify can be launched to produce the triple files.
A sample invocation of the Sparqlify follows:
./sparqlify.sh -m ~/script.sml -h 192.168.0.110 -d pgsimple_fin
-U pgsimple_fin_reader -W pgsimple_fin_reader -o ntriples --dump > ~/triples.n3
A short description of the command arguments follows:
- m, the full path and file name of the SML configuration script where it is described how the data stored in the relational database should be used for generating the RDF triples. A ready-to-use script can be found at https://github.com/disit/osm2km4c/blob/master/sparqlify/install/irdbcmap.sml.txt, in our (DISIT Lab) GitHub repository;
- h, the name (or IP address) of the host where the relational database that contains the Open Street Map data (and that constitutes the Sparqlify source of data) can be found;
- d, the name of the relational database where the Open Street Map data can be found;
- U, the username to be used for authenticating to the relational database;
- W, the password to be used for authenticating to the relational database;
- o, whether the output file should include (nquads) or not to include (ntriples) the RDF graph URI;
- --dump, mandatory flag for that the generated triples could be produced in output;
- > ~/triples.n3, the full path of the file where the produced triples is stored.
Improve the resulting triple files removing the heading and duplicate lines
The first two lines of the output file that the Sparqlify generates are to be stripped away since they are heading lines and they could lead to errors at the time of loading the triple file to the graph database. Also, duplicate lines should be stripped away since they cause a useless performance degradation.
A way this can be achieved in Linux is proposed below here:
tail -n +3 sparqlify-output.n3 > no-headers.n3
sort no-headers.n3 | uniq > ready-to-use.n3
Load the generated triple files to the Knowledge Base
The bulk loading functionality of the graph database should be leveraged for loading the newly generated triples so that they could be part of the Knowledge Base. We recommend loading each province to a separate graph. We recommend removing the existing triples before loading the new ones. Specific precautions could be necessary for some graph databases. Refer to the documentation of the graph database for further details.
Read more
An even more comprehensive description of the process can be found in the article
Comments
irdbcmap.sql, unable to create all the tables
aliferisi - Tue, 09/07/2021 - 16:24Hello
I'm having an issue at "Launch appropriate SQL scripts to prepare the data for an efficient triplification" step.
To be more specific, i renamed the SQL script https://github.com/disit/osm2km4c/blob/master/sparqlify/install/irdbcmap.sql.txt to irdbcmap.sql and replaced the OSM id inside of it. After that, i executed the SQL script with the command # psql -d pgsimple_gre -f osm2km4c/sparqlify/install/irdbcmap.sql and got the following 3 errors:
'''
psql:osm2km4c/sparqlify/install/irdbcmap.sql:2448: NOTICE: table "node_oneway" does not exist, skipping
DROP TABLE
psql:osm2km4c/sparqlify/install/irdbcmap.sql:2473: ERROR: set-returning functions are not allowed in CASE
LINE 11: else unnest(array['forward','backward'])
^
HINT: You might be able to move the set-returning function into a LATERAL FROM item.
psql:osm2km4c/sparqlify/install/irdbcmap.sql:2477: NOTICE: table "way_oneway" does not exist, skipping
DROP TABLE
psql:osm2km4c/sparqlify/install/irdbcmap.sql:2522: ERROR: set-returning functions are not allowed in CASE
LINE 11: else unnest(array['forward','backward'])
^
HINT: You might be able to move the set-returning function into a LATERAL FROM item.
psql:osm2km4c/sparqlify/install/irdbcmap.sql:2526: NOTICE: table "relation_oneway" does not exist, skipping
DROP TABLE
psql:osm2km4c/sparqlify/install/irdbcmap.sql:2569: ERROR: set-returning functions are not allowed in CASE
LINE 11: else unnest(array['forward','backward'])
^
HINT: You might be able to move the set-returning function into a LATERAL FROM item.
'''
Please, provide a solution since i'm not able to generate the triple file (.n3) in the next process step since the tables doesn't exist.
Thanks
the error as "set-returning
roottooladmin1 - Wed, 09/08/2021 - 09:54the error as "set-returning functions are not allowed in CASE" seems to be due to the version of Postgress 10.
read: https://stackoverflow.com/questions/52952384/set-returning-functions-are-not-allowed-in-case-in-postgresql
we are using version 9.8.5. So that we suggest you to change version it may solve.
Snap4city back office team
Thanks for your help, i had
aliferisi - Wed, 09/08/2021 - 13:51Thanks for your help, i had installed Postgres 11 which i completly uninstalled.
Then i installed postgres-9.6 since 9.8.5 wasn't available in apt-get package for debian as you suggested. At the end i managed to execute the sql script without any errors.
However, on the next step, where i have to generate the triple file, i encounter a problem as a result my triple file remains empty.
Output (few of the last lines):
Could that be another version compatibility issue?
it seems to us that you
roottooladmin1 - Thu, 09/09/2021 - 12:16it seems to us that you partially posted the query, or there is some other mixed segments at the beginning or the query you reported.
I suggest you to use the KBSSM VM to start using instead of reinstalling all.
see from https://www.snap4city.org/drupal/node/471
I do not understood which kind of data you fred into and the context of your database .
it is very difficult to help you on data and context dependent problems, without knowing the details.
snap4city support
See also https://www
roottooladmin1 - Thu, 09/09/2021 - 12:53See also https://www.snap4city.org/drupal/node/535
at the end:
Triplify
Loading maps in a local database and keeping them up to date is functional to the generation of RDF triples, that is what we perform in this third step. See the triplify.sh script to learn how it can be done. Note that it is the only script that you will need to customize to generate your own triples. Indeed, both the triplify.sql and triplify.sml scripts are thought to be left unaltered. It is a three-step process:
Below here are some Web resources where you can learn more about tools and projects that we have met in this section of the guide:
Unable to create all tables with SQL script irdbcmap.sql
aliferisi - Tue, 09/07/2021 - 16:26Hello
I'm having an issue at "Launch appropriate SQL scripts to prepare the data for an efficient triplification" step.
To be more specific, i renamed the SQL script: https://github.com/disit/osm2km4c/blob/master/sparqlify/install/irdbcmap.sql.txt to irdbcmap.sql and replaced the OSM id. After that, i executed the SQL script and got the following errors:
'''
psql:osm2km4c/sparqlify/install/irdbcmap.sql:2448: NOTICE: table "node_oneway" does not exist, skipping
DROP TABLE
psql:osm2km4c/sparqlify/install/irdbcmap.sql:2473: ERROR: set-returning functions are not allowed in CASE
LINE 11: else unnest(array['forward','backward'])
^
HINT: You might be able to move the set-returning function into a LATERAL FROM item.
psql:osm2km4c/sparqlify/install/irdbcmap.sql:2477: NOTICE: table "way_oneway" does not exist, skipping
DROP TABLE
psql:osm2km4c/sparqlify/install/irdbcmap.sql:2522: ERROR: set-returning functions are not allowed in CASE
LINE 11: else unnest(array['forward','backward'])
^
HINT: You might be able to move the set-returning function into a LATERAL FROM item.
psql:osm2km4c/sparqlify/install/irdbcmap.sql:2526: NOTICE: table "relation_oneway" does not exist, skipping
DROP TABLE
psql:osm2km4c/sparqlify/install/irdbcmap.sql:2569: ERROR: set-returning functions are not allowed in CASE
LINE 11: else unnest(array['forward','backward'])
^
HINT: You might be able to move the set-returning function into a LATERAL FROM item.
'''
Please, provide a solution since i'm not able to generate the triple file (.n3) in the next process step because table "node_oneway" doesn't exist.
Thanks
the error as "set-returning
roottooladmin1 - Wed, 09/08/2021 - 09:54the error as "set-returning functions are not allowed in CASE" seems to be due to the version of Postgress 10.
read: https://stackoverflow.com/questions/52952384/set-returning-functions-are-not-allowed-in-case-in-postgresql
we are using version 9.8.5. So that we suggest you to change version it may solve.
Snap4city back office team