TC5.10 - Open Street Map ingestion process

Test Case Title	TC5.10 - Open Street Map ingestion process (OSM2SM)
Goal	As administrator, I can Filling the Knowledge Base with a street graph obtained from the Open Street Map.
Prerequisites	Installed Linux Installed Osmosis Installed Sparqlify PostgreSQL with PostGIS extension Osmosis simple schema database You can ask to disit lab to get an Appliance or a service.
Expected successful result	The Knowledge Base contains an up-to-date street graph imported from Open Street Map. The following functionalities are available only for specific Snap4city users with specific privileges.
Steps	Identify and get the latest version of the Open Street Map extract of your interest. Fill an Osmosis simple schema database reading from the Open Street Map extract. Launch appropriate SQL scripts to prepare the data for an efficient triplification. Launch the Sparqlify configured through an appropriate SML script to generate the triples. Improve the resulting triple files removing the heading and duplicate lines. Load the generated triple files to the Knowledge Base.

Since 2024:

There is a Docker based version of the tool to pass from OSM to KB

https://github.com/disit/osm2km4c/tree/master/osm2km4c-docker/Dockers

Prerequisites

The process is proved to complete successfully on both the Ubuntu and the Debian Linux distributions. Anyway, it does not rely on distribution-specific features, so feel free to adopt the distribution you prefer.

The Osmosis distribution, documented at https://wiki.openstreetmap.org/wiki/Osmosis, is leveraged to read from the Open Street Map source files and write their content to a properly shaped relational database.

The Sparqlify generates the RDF triples reading from a relational database, based on a SML configuration file. Documentation and installation artefacts and instructions can be found in the main page of the project at http://aksw.org/Projects/Sparqlify.html, and on GitHub at https://github.com/SmartDataAnalytics/Sparqlify.

PostgreSQL is the recommended relational database engine for storing the Open Street Map data. Documentation and installation artefacts can be found at https://www.postgresql.org/.

PostGIS is an extension that allows an efficient and effective management of geometric and geographic data in PostgreSQL, and it is required to be installed for effectively working with the Open Street Map data. Documentation and installation artefacts can be found at https://postgis.net/.

The relational database schema that we have identified as the most appropriate is the Osmosis simple schema. The SQL scripts for the shaping of the database are part of the Osmosis. Detailed instructions can be found at https://wiki.openstreetmap.org/wiki/Osmosis/Detailed_Usage_0.43#PostGIS_Tasks_.28Simple_Schema.29.

Identify and get the latest version of the Open Street Map extract of your interest

We have identified the Geofabrik as the best source of Open Street Map extracts. It is a specialized portal where Open Street Map source files (and their compressed versions) can be found that address specific continents, countries, and regions. Also, the extracts of the OSC files are available, where the variations that occur to the Open Street Map are represented on a daily basis. You can get the Geofabrik Open Street Map extracts from download section of the portal, that can be reached at https://download.geofabrik.de/.

Fill an Osmosis simple schema database reading from the Open Street Map extract

The Osmosis distribution includes a command-line tool and a set of scripts that allow to perform a wide set of manipulations over the Open Street Map data, such as extractions, transformations, comparisons, and so on. In this use case, the tool is leveraged to fill a properly shaped relational database with the data extracted from the Open Street Map source files. Detailed instructions of how it could be performed can be found at https://wiki.openstreetmap.org/wiki/Osmosis/Detailed_Usage_0.46#--write-pgsimp_.28--ws.29.

Launch appropriate SQL scripts to prepare the data for an efficient triplification

After that the relational database has been filled with the Open Street Map data, a set of auxiliary tables must be created to speed up the following steps of the ingestion process. The creation of such tables is performed through the execution of a SQL script that can be found in our (DISIT Lab) GitHub repository at https://github.com/disit/osm2km4c/blob/master/sparqlify/install/performance_optimization.sql. The script is typically needed to be executed only once for each database, since it accesses data that tend to be immutable over the time, such as the listing of the Public Administrations that govern the territories of interest, their borders, and some country-dependent configurations.

After that, and every time that the Open Street Map data are updated in the relational database, some other optimizations are necessary. Such optimizations are to be performed executing the SQL script that can be found at https://github.com/disit/osm2km4c/blob/master/sparqlify/install/irdbcmap.sql.txt, in our (DISIT Lab) GitHub repository. At the beginning of such SQL script, a configuration section can be found. We reasonably expect that the only parameter of your real interest locates at line 25, and it is the Open Street Map unique identifier of the geographic boundary of the triplification. Indeed, even if your relational database contains, suppose, the Open Street Map data of a whole country, we do not recommend producing the triples for the whole country in a single execution. Indeed, we recommend producing the triples for one province at a time. This way, you are granted each execution to complete in a reasonable time.

Figure: The Open Street Map unique identifier for the Municipality of Helsinki is 34914

Launch the Sparqlify configured through an appropriate SML script to generate the triples

After that the data preparing outlined above has been performed, the Sparqlify can be launched to produce the triple files.

A sample invocation of the Sparqlify follows:

./sparqlify.sh -m ~/script.sml -h 192.168.0.110 -d pgsimple_fin
-U pgsimple_fin_reader -W pgsimple_fin_reader -o ntriples --dump > ~/triples.n3

A short description of the command arguments follows:

m, the full path and file name of the SML configuration script where it is described how the data stored in the relational database should be used for generating the RDF triples. A ready-to-use script can be found at https://github.com/disit/osm2km4c/blob/master/sparqlify/install/irdbcmap.sml.txt, in our (DISIT Lab) GitHub repository;
h, the name (or IP address) of the host where the relational database that contains the Open Street Map data (and that constitutes the Sparqlify source of data) can be found;
d, the name of the relational database where the Open Street Map data can be found;
U, the username to be used for authenticating to the relational database;
W, the password to be used for authenticating to the relational database;
o, whether the output file should include (nquads) or not to include (ntriples) the RDF graph URI;
--dump, mandatory flag for that the generated triples could be produced in output;
> ~/triples.n3, the full path of the file where the produced triples is stored.

Improve the resulting triple files removing the heading and duplicate lines

The first two lines of the output file that the Sparqlify generates are to be stripped away since they are heading lines and they could lead to errors at the time of loading the triple file to the graph database. Also, duplicate lines should be stripped away since they cause a useless performance degradation.

A way this can be achieved in Linux is proposed below here:

tail -n +3 sparqlify-output.n3 > no-headers.n3
sort no-headers.n3 | uniq > ready-to-use.n3

Load the generated triple files to the Knowledge Base

The bulk loading functionality of the graph database should be leveraged for loading the newly generated triples so that they could be part of the Knowledge Base. We recommend loading each province to a separate graph. We recommend removing the existing triples before loading the new ones. Specific precautions could be necessary for some graph databases. Refer to the documentation of the graph database for further details.

Read more

An even more comprehensive description of the process can be found in the article

From the Open Street Map to the Km4City street graph
- https://www.snap4city.org/download/video/From%20the%20Open%20Street%20Map%20to%20the%20Km4City%20street%20graph.pdf

Comments

irdbcmap.sql, unable to create all the tables

aliferisi - Tue, 09/07/2021 - 16:24

Hello

I'm having an issue at "Launch appropriate SQL scripts to prepare the data for an efficient triplification" step.

To be more specific, i renamed the SQL script https://github.com/disit/osm2km4c/blob/master/sparqlify/install/irdbcmap.sql.txt to irdbcmap.sql and replaced the OSM id inside of it. After that, i executed the SQL script with the command # psql -d pgsimple_gre -f osm2km4c/sparqlify/install/irdbcmap.sql and got the following 3 errors:
'''
psql:osm2km4c/sparqlify/install/irdbcmap.sql:2448: NOTICE: table "node_oneway" does not exist, skipping
DROP TABLE
psql:osm2km4c/sparqlify/install/irdbcmap.sql:2473: ERROR: set-returning functions are not allowed in CASE
LINE 11: else unnest(array['forward','backward'])
^
HINT: You might be able to move the set-returning function into a LATERAL FROM item.
psql:osm2km4c/sparqlify/install/irdbcmap.sql:2477: NOTICE: table "way_oneway" does not exist, skipping
DROP TABLE
psql:osm2km4c/sparqlify/install/irdbcmap.sql:2522: ERROR: set-returning functions are not allowed in CASE
LINE 11: else unnest(array['forward','backward'])
^
HINT: You might be able to move the set-returning function into a LATERAL FROM item.
psql:osm2km4c/sparqlify/install/irdbcmap.sql:2526: NOTICE: table "relation_oneway" does not exist, skipping
DROP TABLE
psql:osm2km4c/sparqlify/install/irdbcmap.sql:2569: ERROR: set-returning functions are not allowed in CASE
LINE 11: else unnest(array['forward','backward'])
^
HINT: You might be able to move the set-returning function into a LATERAL FROM item.
'''

Please, provide a solution since i'm not able to generate the triple file (.n3) in the next process step since the tables doesn't exist.

Thanks

the error as "set-returning

roottooladmin1 - Wed, 09/08/2021 - 09:54

the error as "set-returning functions are not allowed in CASE" seems to be due to the version of Postgress 10.

read: https://stackoverflow.com/questions/52952384/set-returning-functions-are-not-allowed-in-case-in-postgresql
we are using version 9.8.5. So that we suggest you to change version it may solve.

Snap4city back office team

Thanks for your help, i had

aliferisi - Wed, 09/08/2021 - 13:51

Thanks for your help, i had installed Postgres 11 which i completly uninstalled.

Then i installed postgres-9.6 since 9.8.5 wasn't available in apt-get package for debian as you suggested. At the end i managed to execute the sql script without any errors.

However, on the next step, where i have to generate the triple file, i encounter a problem as a result my triple file remains empty.

Output (few of the last lines):

select * from NodeStreetNumberRoad) a_235
WHERE ("graph_uri" IS NOT NULL) AND ("en_id" IS NOT NULL)
UNION ALL
SELECT NULL::text "C_14", NULL::text "C_58", NULL::integer "C_13", NULL::text "C_57", NULL::text "C_12", NULL::text "C_56", NULL::text "C_11", NULL::text "C_55", NULL::text "C_10", NULL::text "C_54", 'http://www.disit.org/km4city/schema#Restriction'::text "C_53", NULL::integer "C_52", NULL::text "C_51", NULL::text "C_50", NULL::text "C_49", NULL::text "C_48", NULL::text "C_25", NULL::text "C_24", NULL::text "C_23", NULL::integer "C_22", NULL::integer "C_66", NULL::text "C_21", NULL::text "C_65", NULL::text "C_20", NULL::text "C_64", NULL::double precision "C_63", NULL::integer "C_62", NULL::text "C_61", NULL::text "C_60", NULL::text "C_3", NULL::text "C_5", NULL::text "C_4", NULL::text "C_7", NULL::text "C_6", NULL::text "C_19", NULL::text "C_9", NULL::text "C_18", NULL::integer "C_8", NULL::text "C_17", NULL::text "C_16", NULL::text "C_15", NULL::text "C_59", NULL::text "C_36", 'http://www.w3.org/1999/02/22-rdf-syntax-ns#type'::text "C_35", NULL::text "C_34", "graph_uri" "C_33", NULL::integer "C_32", NULL::text "C_31", NULL::text "C_30", "to_uri" "C_29", "from_uri" "C_28", NULL::text "C_27", NULL::text "C_26", NULL::text "C_47", NULL::double precision "C_46", NULL::text "C_45", NULL::integer "C_44", NULL::geometry "C_43", NULL::integer "C_42", NULL::text "C_41", NULL::text "C_40", NULL::text "C_39", NULL::text "C_38", NULL::text "C_37"
FROM
(
select * from turn_restrictions) a_236
WHERE ("from_uri" IS NOT NULL) AND ("graph_uri" IS NOT NULL) AND ("to_uri" IS NOT NULL)
) "a_237"

2021-09-08 14:02:44,438 TRACE org.aksw.sparqlify.core.sparql.QueryExecutionSparqlify: Closed connection: [HikariProxyConnection@1758876146 wrapping org.postgresql.jdbc.PgConnection@1654a892]
2021-09-08 14:02:44,441 DEBUG com.zaxxer.hikari.pool.ProxyConnection: HikariPool-1 - Executed rollback on connection org.postgresql.jdbc.PgConnection@1654a892 due to dirty commit state on close().
2021-09-08 14:02:44,441 DEBUG com.zaxxer.hikari.pool.PoolBase: HikariPool-1 - Reset (autoCommit) on connection org.postgresql.jdbc.PgConnection@1654a892

Could that be another version compatibility issue?

it seems to us that you

roottooladmin1 - Thu, 09/09/2021 - 12:16

it seems to us that you partially posted the query, or there is some other mixed segments at the beginning or the query you reported.
I suggest you to use the KBSSM VM to start using instead of reinstalling all.

see from https://www.snap4city.org/drupal/node/471
I do not understood which kind of data you fred into and the context of your database .

it is very difficult to help you on data and context dependent problems, without knowing the details.

snap4city support

Triplify

Loading maps in a local database and keeping them up to date is functional to the generation of RDF triples, that is what we perform in this third step. See the triplify.sh script to learn how it can be done. Note that it is the only script that you will need to customize to generate your own triples. Indeed, both the triplify.sql and triplify.sml scripts are thought to be left unaltered. It is a three-step process:

Prepare data in your Postgresql database executing the triplify.sql script. In this step you must provide a parameter, again named boundary, that is the OSM ID of the OSM Relation that defines the boundary for the generation of triples. This boundary must be fully contained within the boundary that we have set during initialization;
Generate RDF triples from the data that you have prepared in your Postgresql database. Use the sparqlify tool to do that;
Perform some clean-up operations on the n3 file that the sparqlify produces.

Below here are some Web resources where you can learn more about tools and projects that we have met in this section of the guide:

Unable to create all tables with SQL script irdbcmap.sql

aliferisi - Tue, 09/07/2021 - 16:26

Hello

I'm having an issue at "Launch appropriate SQL scripts to prepare the data for an efficient triplification" step.

To be more specific, i renamed the SQL script: https://github.com/disit/osm2km4c/blob/master/sparqlify/install/irdbcmap.sql.txt to irdbcmap.sql and replaced the OSM id. After that, i executed the SQL script and got the following errors:
'''
psql:osm2km4c/sparqlify/install/irdbcmap.sql:2448: NOTICE: table "node_oneway" does not exist, skipping
DROP TABLE
psql:osm2km4c/sparqlify/install/irdbcmap.sql:2473: ERROR: set-returning functions are not allowed in CASE
LINE 11: else unnest(array['forward','backward'])
^
HINT: You might be able to move the set-returning function into a LATERAL FROM item.
psql:osm2km4c/sparqlify/install/irdbcmap.sql:2477: NOTICE: table "way_oneway" does not exist, skipping
DROP TABLE
psql:osm2km4c/sparqlify/install/irdbcmap.sql:2522: ERROR: set-returning functions are not allowed in CASE
LINE 11: else unnest(array['forward','backward'])
^
HINT: You might be able to move the set-returning function into a LATERAL FROM item.
psql:osm2km4c/sparqlify/install/irdbcmap.sql:2526: NOTICE: table "relation_oneway" does not exist, skipping
DROP TABLE
psql:osm2km4c/sparqlify/install/irdbcmap.sql:2569: ERROR: set-returning functions are not allowed in CASE
LINE 11: else unnest(array['forward','backward'])
^
HINT: You might be able to move the set-returning function into a LATERAL FROM item.
'''

Please, provide a solution since i'm not able to generate the triple file (.n3) in the next process step because table "node_oneway" doesn't exist.

Thanks

the error as "set-returning

roottooladmin1 - Wed, 09/08/2021 - 09:54

the error as "set-returning functions are not allowed in CASE" seems to be due to the version of Postgress 10.

read: https://stackoverflow.com/questions/52952384/set-returning-functions-are-not-allowed-in-case-in-postgresql
we are using version 9.8.5. So that we suggest you to change version it may solve.

Snap4city back office team

Questionnaires Analysis for Valencia City (TOURISMO) drupaladmin
Questionnaires Analysis for Rhodes (TOURISMO) drupaladmin
Questionnaires Analysis for Malta (TOURISMO) drupaladmin
Welcome to Malta (TOURISMO) drupaladmin
SADI-MIAC: Sistema di Assistenza alle Decisioni Integrato con Modelli Digital Twin e Intelligenza Artificiale per le attività commerciali roottooladmin1
News from Snap4City & slides, Where to Meet Snap4City experts roottooladmin1
Snap4City Newsletter of February 2025 roottooladmin1
HOW to: see dense data on Yx rendering for time series roottooladmin1
Snap4City Newsletter of January 2025 roottooladmin1
TC10.5 - List of the Exploited Third-party Components with their Open Source Licenses (version valid up to 2020) drupaladmin
Call for Papers — IEEE Big Data Services 2026 (IEEE BigDataService 2026) roottooladmin1
Call for Papers: IEEE Big Data Services 2026 roottooladmin1
Welcome to Varna (TOURISMO) drupaladmin
Welcome to Valencia (TOURISMO) drupaladmin
Welcome to Bisevo (TOURISMO) drupaladmin

TC5.10 - Open Street Map ingestion process

Warning message

Comments

irdbcmap.sql, unable to create all the tables

the error as "set-returning

Thanks for your help, i had

it seems to us that you

See also https://www

Triplify

Unable to create all tables with SQL script irdbcmap.sql

the error as "set-returning