Karma - in practice: triplification process

An introductory guide for generating RDF triples from relational data in minutes

Karma (http://usc-isi-i2.github.io/karma/) is a data integration tool developed at the Center of Knowledge Graphs of the Information Science Institute at the University of Southern California.

Here at DISIT, we exploit Karma for triplifying relational data.

This document aims to be an introductory guide that helps beginners moving their first steps with the tool.

Roadmap

Here are the steps to have your triples in your pocket in minutes:

  1. Get one of the ready-to-use DISIT VMs
  2. Launch the Karma server
  3. Build your own model:
    1. Load vocabularies
    2. Load relational tables
    3. (Optional) Load R2RML models
    4. Define mappings
  4. Export your model
  5. Launch the command-line tool
  6. Enjoy!

Get one of the ready-to-use DISIT VMs

The DISIT Lab makes available through its Drupal portal a set of ready-to-use virtual machines specifically oriented to data integration. Here is how you can get one of them:

  1. Connect to https://www.disit.org/drupal/?q=node/6690 and scroll down to the section “MACCHINA VIRTUALE, VMSDETL, GIA' PRONTA”
  2. Get the “Versione del 2017/2018 0.8 con Phoenix” at https://www.disit.org/vmsdetl/VMSDETL-2017-v0-8.rar, or the “Versione del 2017/2018 0.8 con Phoenix per Virtualbox” at https://www.disit.org/vmsdetl/VMSDETL-2017-v0-8-ovf.rar, unless you have a good reason for picking a different one
  3. Wait for the download to complete, and extract the archive
  4. Launch the VM player of your choice
  5. Open the VM
  6. Run it

Launch the Karma server

Do the following to run the Karma server:

  1. Open a shell
  2. Move to ~/programs/Web-Karma-master/karma-web
  3. Run mvn -Djetty.port=9999 jetty:run
  4. Wait while the Jetty server comes up
  5. Connect to localhost:9999 where you will find the Web application for building your model

Build your own model

For that Karma could produce the RDF triples for you, it is required that you instruct it about how relational data should be mapped to semantic data. Documents that describe such a mapping are called R2RML models. Models are built operating a dedicated Karma Web application, and they are exported as ttl files.

Below here are the basic instructions for building a model from scratch or editing an existing one.

Load vocabularies

Have you connected to http://localhost:9999? Are you displaying something similar to this?


Right, you are ready to load your vocabularies.

Identify classes and properties that you wish to appear in RDF triples that will be the result of the whole process. Identify vocabularies where such classes and properties are defined. Load them.

Below here is how you load a vocabulary:

  1. Hit Import, at the top left corner of the Web page
  2. Hit From File
  3. Select the vocabulary file (it can be an OWL, RDF/XML, or TTL file)
  4. Leave OWL Ontology selected, and hit Next
  5. Indicate the correct file encoding if the proposed one is not, and hit Import

You should now see your newly imported vocabulary displayed in the Command History (left column).

Load relational tables

Identify tables in your RDB where source data can be found. Load them in your model.

Below here is how you load a table:

  1. Hit Import, at the top left corner of the Web page
  2. Hit Database Table. The Import Database Table dialog should open.
  3. Fill in the form with authentication data and RDB name, and hit OK
  4. A table listing should appear below the form
  5. Put the mouse pointer over the table of your interest
  6. Buttons Import and Preview should appear at the right of the table name. Hit Import.
  7. Confirmation message “Table imported in the workspace!” should appear. Hit OK.
  8. Repeat steps 5 – 8 for each table where source data are found
  9. Hit Close at the bottom right corner of the Import Database Table dialog to dismiss it

(Optional) Load R2RML models

If you already have built and exported a model in the past, and you now just need to make a modification over it, you can start loading and applying your existing model, instead of rebuilding it from scratch.

Below here is how you apply an existing model:

  1. Identify the table to which the model has to be applied, and hit the triangle that is displayed next to the table name similar to nex figure
  2. Select Apply R2RML Model, and then From File
  3. Select the ttl file that contains your model, and hit Open
  4. Done. Classes and links should appear in the workspace.

Define mappings

Below here is how you specify that a column of an RDB table maps to a data property of a semantic class, and how to specify that a column contains an identifier that can be used for building the URIs of instances of the semantic class:

  1. Below RDB table name, identify the blue box that contains the column name written in white
  2. Hit the white triangle that you can see next to the column name
  3. Hit Set Semantic Type
  4. Pick the checkbox at left of property of Class
  5. Hit the Edit button at right of property of Class
  6. Select the semantic class from the All Classes list. Use the Class textbox for filtering.
  7. Select the property from the All Properties list. Use the Property textbox for filtering.
  8. If the column is a key, pick the Mark as key for the class checkbox
  9. You can map the column to a typed literal, filling the textbox below the label Literal type
  10. When you are done, hit Save. Repeat the procedure for each of the columns to be mapped.

Below here is how you specify instead that a foreign key of a table in a relational database corresponds to an object property of a semantic class. Scenario: a relational table stops, that corresponds to a semantic class Stop, has a column agency_id where the unique identifier of the agency that manages the stop can be found. Each value in agency_id corresponds to one and only one value in a column, let’s say id, that can be found in the relational table agencies. Table agencies corresponds to the semantic class Agency. We wish resources of class Stop to be linked each to the appropriate resource of class Agency, through the property gtfs:agency. For such a purpose, we will do the following:

  1. Load relational table stops to workspace
  2. Map data properties, linking columns in table stops, to the class Stop, through appropriate properties, as outlined above
  3. Identify the grey box with rounded angles that has the name of the class Stop written within. It should locate in the workspace. Identify the black triangle that should locate near the right margin of the box. Click it.
  4. Select Add Outgoing Link. A popup window should open.
  5. Type gtfs:agency in the box labelled Property, and gtfs:Agency in the box labelled To Class
  6. Click Save in the bottom right corner of the popup window to dismiss it.
  7. A new grey box, related to class Agency, will appear in the workspace, linked to the grey box related to class Stop through a link labelled agency for brevity
  8. Map column agency_id as a data property of class Agency, also specifying that it is a unique identifier, as described in the above paragraph
  9. Repeat for all foreign keys to be mapped, then go to next step (Export your model).

Learn more about the building of R2RML models at https://github.com/usc-isi-i2/karma-step-by-step/wiki.

Export your model

Once you have defined all needed mappings, you have to export your model to a ttl file, so that you can provide it as a parameter to the command-line Karma tool that performs the triplification.

Here is how you can export your model:

  1. Identify the RDB table whose model you wish to export
  2. Hit the black triangle at the right of the table name
  3. Select Publish, and then Model
  4. A popup should appear at the top right corner of the window, saying “R2RML Model published”
  5. Hit Manage Models, in the menu bar at the top of the page
  6. A listing should appear of all models that you have exported in the current session
  7. Identify the row corresponding to the last model exported, based on the File Name (the name of the RDB table) and the Publish Time.
  8. Cut the URL that you can find in the rightmost column of the prospect, and open it in a new tab
  9. Save As… the page that you have opened at step 8.
  10. Done. The file that you have saved at step 9 is your ready-to-use R2RML model.

Launch the command-line tool

Once you have exported your R2RML model as a ttl file, you are ready to perform the triplification:

  1. Open a shell
  2. Move to /home/ubuntu/programs/Web-Karma-master/karma-offline
  3. Launch the following as a single line command, customizing parameter values in bold:
     mvn exec:java
    -Dexec.mainClass="edu.isi.karma.rdf.OfflineRdfGenerator"
    -Dexec.args=" --sourcetype DB --modelfilepath /path/to/model.ttl
    --outputfile /path/to/output_triples_file.n3 --dbtype MySQL
    –hostname mysql_srv_hostname_or_ip_address --username mysql_user
    --password mysql_pwd --portnumber 3306 --dbname mysql_dbname
    --tablename mysql_table_name -Dexec.classpathScope=compile

Report mistakes and provide suggestions about this guide

See the help desk page on the left side menu