TC6.8 - ETL processes for data transformation, and exploiting MicroServices/API/RestCall

×

Warning message

  • You can't delete this newsletter because it has not been sent to all its subscribers.
  • You can't delete this newsletter because it has not been sent to all its subscribers.

Test Case Title

TC6.8 - ETL processes for data transformation, and exploiting MicroServices/API/RestCall

Goal

Snap4City, provides support for real time analytics which is performed by using a set of services. One of these is the possibility to:

Create ETL processes capable to make transformations on data.

Get data from SigFOX gateway.

Ingesting data sets with real time data, connecting them with Knowledge Base.

Prerequisites

Using a PC or Mobile with a web browser

The following functionalities are available only for specific Snap4city users with specific privileges.

Expected successful result

Create an ETL process for data transformation directly in the snap4city context

Steps

 

 

Please note that to correctly perform this Test Case you need and access to the ETL Virtual Machine as described below. To have access to a Virtual Machine to perform ETL please contact snap4city@disit.org.

 

Users roles for which this test case is available: Area Manager, ToolAdmin, RootAdmin
Users roles for which the test case is not available: Manager
User used to describe the Example: snap4city

Examples:

  • Login on the snap4city platform
  • Open the VM from the snap4city home page
  • Create the ETL processes DIRECTLY from the Virtual Machine in the snap4city portal

Users roles for which this test case is available: All
User used to describe this Examples: ‘snap4city’ (role: AreaManager)

  • Go to the snap4city home page 
  • Make the login
  • I can see the snap4city home page
  • Now you can select one of the functionalities on the left panel
    • Note that the list of the functionalities depends on the user role (and on the related permissions).

 


Example 1: Open the VM from the snap4city home page

Users roles for which this test case is available: AreaManager, ToolAdmin, RootAdmin
User used to describe this Examples: ‘snap4city’ (role: AreaManager

  • Open the VM from the snap4city home page (Example 1)
  • Go to the snap4city home page 
  • Make the login
  • If the user has the developer permissions (as provided for the ‘AreaManager’), he/she finds the menu ‘Development Tools > ETL development’
  • Click on the menu and insert the correct password (the same credentials two times)
     

    Fig.: vnc connection.

     

    Fig.: Virtual machine connection.

 

 


Example 2: Create the ETL processes DIRECTLY from the Virtual Machine in the snap4city portal

Users roles for which this test case is available: AreaManager, ToolAdmin, RootAdmin
User used to describe this Examples: ‘snap4city’ (role: AreaManager) 

  • Make the login on the snap4city portal and open the Virtual Machine (Example 1)
  • To create an ETL process, it is necessary the use of the Pentaho Kettle tool Spoon (http://www.pentaho.com) installed on the Virtual Machine available in the snap4city portal

ETL creation, follow these steps:

  • Locate a dataset of interest, that can be static (e.g., the ‘http://www.nk.hel.fi/avoindata/avoin_data-avustukset.xls’) or real-time, then you can create the ETL (Extract Transform and Load) process to ingest the data.
  • Open Spoon via a Terminal emulator (search the Icon  at the bottom of the screen or click on ‘Application’ menu > ‘Terminal Emulator’), then write the command ‘spoon.sh’ and click ‘enter’. 


Fig. –Virtual Machine (VM) Terminal Emulator (command line).

 

  • The Spoon Interface will open, now it is possible to create/load ETL processes.


Fig.: Spoon Interface.

 

  • To create a new ETL make a double click on the tab ‘jobs’ (or a right click and select ‘New’) in the left menu.


Fig.: Spoon search the menu for a new job.

 

  • Then a new job is created


Fig.: Spoon new job.

 

  • Now you can create your ETL connecting the steps (or blocks) available in the left menu (the connection is realized with the Hops or links).
    • Kettle is based on two key concepts (from operating point of view):
      • Job (with extension “.kjb”);
      • Transformation (with extension “.ktr”), composed of several steps.

In kettle the data are rows flow from one step to another one. A job manages the sequential execution of lower-level entities: transformations or other jobs.

  • Transformation and job can be realized using the steps, that are classified into three main categories:
    • Input: process some kind of 'raw' resource (file, database query or system variables) and create an output stream of records from it.
    • Output: (the reverse of input steps): accept records, and store them in some external resource (file, database table, etc.).
    • Transforming: process input streams and perform particular actions on them (adding new fields/new records); these actions produce one or more output streams.


Fig.: some available Spoon steps.

 

  • To put the step in a job or in a Transformation, the drag and drop functionality is active. The connection is done thank to the hops (link)
  • The first step is the ‘Start’ and usually the last is ‘Success’


Fig.: Spoon job and first step: ‘Start’. 

 


Fig.: Spoon job and first step: Job, Transformation, hops. 

  • To save a job or a transformation;
    • Right click on the related icon
    • Change the job/transformation name
    • Select the directory into which the file is saved (is an xml file)
    • The select the file name 


Fig.: Spoon job saving. 

  • To connect to a web server a Spoon step is present:
    • In a job, it is present in the menu ‘file management > HTTP’ or can be retrieved using the text search filter ‘Entries’. 


Fig.: Spoon HTTP step in a job.

 

  • In a Transformation, it is present in the menu ‘Lookup > HTTP Client’ (or HTTP Post), or can be retrieved using the text search filter ‘Entries’.


Fig.: Spoon HTTP step in a transformation.

SAMPLE: Helsinki_youth_subsidies (Static: the results are always the same or the same for a long period)

  • A set of samples are already present in the VM, for example we can choose the ‘Helsinki_youth_subsidies’ that is capable to: i) download a dataset; ii) transform it; iii) save the new data in a file (into the VM file system). To verify the ETL functionalities follow the steps:
    • Click on the menu and select: ‘File > Open’ and select the file from: ‘Ubuntu/Desktop/snap4cityETL/Helsinki_youth_subsidies_XLS/Ingestion/Maink.kjb’
    • In the Spoon Interface the Main Spoon Job appears.

Fig.: Spoon Interface: open the ‘Helsinki_youth_subsidies_XLS’ ETL.

 

Fig.: Spoon Interface:  the ‘Helsinki_youth_subsidies_XLS’ ETL opened. 

  • To run the ETL process, click on the main ‘Run’ button (), a popup will open with a set of default values (including the url from which to download the dataset: ‘’), click on the Second ‘Run’ button.


Fig.: Spoon Interface: launch an ETL (‘Run button’). 
 

  • Once launched the process, a set of windows will open. One relevant is the ‘Logging window’ from which it is possible to read the logs. When the logging windows report the sentence ‘‘Spoon – job has ended’, the job is ended with no errors.


Fig.:  Spoon Interface: ETL executes, logging details.

 

  • Open the ‘HTTP’ step to view how the connection with the web server it is realized:
    • Click on the ‘HTTP’ Icon (right click) and select ‘edit’ 


Fig.:  ETL execution: edit the HTTP step.

  • In the URL field is present the direct url to download the dataset (or a parameter that can come from the job settings, from the Database, from a configuration file, etc.)
  • In the Target file field is present the file path into which the data coming from the web server are written


Fig.:  ETL execution: HTTP step details.

  • What are the actions done by this process and how it is possible to verify them? Some block, can be expanded, with double left click or 1 right click of the mouse.
    • Activities:
      • block ‘Start’: start the process (see its details with: ‘right click > edit’),
      • block ‘Create original data folder’:
        • if not exists, creates a new folder for the downloaded file: ‘‘/home/ubuntu/Desktop/ETL_Output/Helsinki_youth_subsidies_XLS/Original_data’
      • block ‘HTTP’: makes the HTTP REST request (to ‘http://www.nk.hel.fi/avoindata/avoin_data-avustukset.xls’) and saves the xls file (named ‘avoin_data-avustukset.xls’) in the folder created in the previous block
      • block ‘File Exists’: checks if the uploaded file is already present
      • block ‘Create Elaborated data folder’: ‘‘/home/ubuntu/Desktop/ETL_Output/Helsinki_youth_subsidies_XLS/Elaborated_data’’
      • block ‘DUMMY’: do nothing
      • block ‘process data’: writes the downloaded file, made data analysis and save the result in a new file in the folder ‘‘/home/ubuntu/Desktop/ETL_Output/Helsinki_youth_subsidies_XLS/Elaborated_data/Result_data.csv’
      • block ‘Success’: ends the process.
    • How to verify:
      • Search the folder into the file system and verify the presence of the files, you can also delete them, and launch again the ETL process.


Fig.: Howto  verify the Helsinki_youth_subsidies_XLS Results.

SAMPLE: ‘SigfoxSM’ (Real Time: the results change every minute)

  • Click on the menu and select: ‘File > Open’ and select the file from: ‘Ubuntu/Desktop/snap4cityETL/SigfoxSM/Ingestion/Main.kjb’
  • In the Spoon Interface the Main Spoon Job appears


Fig:  Select the Sigfox ETL from Spoon.

  • Run the SigFox ETL: Click on the Main menu () and then in the ‘Run’ button of the pop-up. Note that the ETLK manages two SigFox sensors.


Fig:  Run the Sigfox ETL from Spoon.

 


Fig:  Run the SigFOX ETL.

 

  • Verify that the ETL has been correctly executed:
    • In the logging tab, it is possible to see the log details. If is present the text ‘Spoon – Job has ended’, the job has been executed and if errors are present it is possible to find them in the menu (red text).


Fig:  ETL Logging details: process executed without errors.

  • Verify the TRANSFORMATION Results:
    • Search the data downloaded from the sensors. It is possible to find them in the following path:

  • Search the data coming from the sensors and enriched by the ETL (only the interesting data coming from the sensors are taken and other needed information such as the observation time, and the URI or ID associated to the sensors, etc.)


Example 3: Open an ETL process connecting real time data to the Km4city KnowledgeBase, DIRECTLY from the Virtual Machine in the snap4city portal

Users roles for which this test case is available: AreaManager, ToolAdmin, RootAdmin
User used to describe this Examples: ‘snap4city’ (role: AreaManager)

  • Make the login on the snap4city portal and open the Virtual Machine (Example 1)
  • To create an ETL process, it is necessary the use of the Pentaho Kettle tool Spoon (http://www.pentaho.com) installed on the Virtual Machine available in the snap4city portal
  • Open Spoon via a Terminal emulator (search the Icon  at the bottom of the screen or click on ‘Application’ menu > ‘Terminal Emulator’), then write the command ‘spoon.sh’ and click ‘enter’
  • The Spoon Interface will open, now it is possible load ETL processes.

SAMPLE: Electric_vehicle_charging_kmz_phoenix (Static AND RealTIME)

This ETL is composed of two main sub ETLs:

STATIC ETL

  • To verify the ETL functionalities follow the steps:
    • Click on the Menu and select ‘File > Open’ selecting Main.kjb from the  following folder: /home/ubuntu/Desktop/snap4cityETL/Electric_vehicle_charging_kmz_phoenix/Static/Ingestion
    • In the Spoon Interface the Main Spoon Job appears.


Fig.: Spoon Interface: open the ‘Electric_vehicle_charging_kmz_phoenix’ ETL.

 


Fig.: Spoon Interface:  the ‘Electric_vehicle_charging_kmz_phoenix’ ETL opened.

  • To run the ETL process, click on the main ‘Run’ button (), a popup will open with a set of default values (including the url from which to download the dataset: ‘’), click on the Second ‘Run’ button.

 


Fig.: Spoon Interface: launch an ETL (‘Run button’).

 

  • Once launched the process, a set of windows will open. One relevant is the ‘Logging window’ from which it is possible to read the logs. When the logging windows report the sentence ‘‘Spoon – job has ended’, the job is ended with no errors.


Fig.:  Spoon Interface: ETL executes, logging details.

 At this time, the ETL has Ingested the Electric vehicle charging, quality improved and produce the triple in turtle format (N3 Files).

To verify, with any text editor (e.g. gedit) Open the resulted file which is stored in the   folder /home/ubuntu/Desktop/Triples/Servizi/Electric_vehicle_charging_kmz_ST/2018_05/31/16/1354/

 

DINAMIC ETL

  • To verify the ETL functionalities follow the steps:
    • Click on the Menu and select ‘File > Open’ selecting Main.kjb from the  following folder: /home/ubuntu/Desktop/snap4cityETL/Electric_vehicle_charging_kmz_phoenix/RealTime


Fig.: Spoon Interface: open the ‘Electric_vehicle_charging_kmz_phoenix’ ETL.

  • In the Spoon Interface the Main Spoon Job appears.


Fig.: Spoon Interface:  the ‘Electric_vehicle_charging_kmz_phoenix’ ETL opened.

  • To run the ETL process, click on the main ‘Run’ button (), a popup will open with a set of default values (including the url from which to download the dataset: ‘’), click on the Second ‘Run’ button.

  • Fig.: Spoon Interface: launch an ETL (‘Run button’).

 

  • Once launched the process, a set of windows will open. One relevant is the ‘Logging window’ from which it is possible to read the logs. When the logging windows report the sentence ‘‘Spoon – job has ended’, the job is ended with no errors.


Fig.:  Spoon Interface: ETL executes, logging details.

Connection STATIC to RealTIME

To the connection from the realtime data and verify Bla bl ab l a ServiceURI

For example. http://www.disit.org/km4city/resource/eCharging_15EP22T2AA1S000070