Test Case Title |
TC6.8 - ETL processes for data transformation, and exploiting MicroServices/API/RestCall |
Goal |
Snap4City, provides support for real time analytics which is performed by using a set of services. One of these is the possibility to: Create ETL processes capable to make transformations on data. Get data from SigFOX gateway. Ingesting data sets with real time data, connecting them with Knowledge Base. |
Prerequisites |
Using a PC or Mobile with a web browser The following functionalities are available only for specific Snap4city users with specific privileges. |
Expected successful result |
Create an ETL process for data transformation directly in the snap4city context |
Steps |
Please note that to correctly perform this Test Case you need and access to the ETL Virtual Machine as described below. To have access to a Virtual Machine to perform ETL please contact snap4city@disit.org. |
Users roles for which this test case is available: Area Manager, ToolAdmin, RootAdmin
Users roles for which the test case is not available: Manager
User used to describe the Example: ‘snap4city’
Examples:
- Login on the snap4city platform
- Open the VM from the snap4city home page
- Create the ETL processes DIRECTLY from the Virtual Machine in the snap4city portal
Users roles for which this test case is available: All
User used to describe this Examples: ‘snap4city’ (role: AreaManager)
- Go to the snap4city home page
- Make the login
- I can see the snap4city home page
- Now you can select one of the functionalities on the left panel
- Note that the list of the functionalities depends on the user role (and on the related permissions).
Example 1: Open the VM from the snap4city home page
Users roles for which this test case is available: AreaManager, ToolAdmin, RootAdmin
User used to describe this Examples: ‘snap4city’ (role: AreaManager)
- Open the VM from the snap4city home page (Example 1)
- Go to the snap4city home page
- Make the login
- If the user has the developer permissions (as provided for the ‘AreaManager’), he/she finds the menu ‘Development Tools > ETL development’
- Click on the menu and insert the correct password (the same credentials two times)
Fig.: vnc connection.
Fig.: Virtual machine connection.
Example 2: Create the ETL processes DIRECTLY from the Virtual Machine in the snap4city portal
Users roles for which this test case is available: AreaManager, ToolAdmin, RootAdmin
User used to describe this Examples: ‘snap4city’ (role: AreaManager)
- Make the login on the snap4city portal and open the Virtual Machine (Example 1)
- To create an ETL process, it is necessary the use of the Pentaho Kettle tool Spoon (http://www.pentaho.com) installed on the Virtual Machine available in the snap4city portal
ETL creation, follow these steps:
- Locate a dataset of interest, that can be static (e.g., the ‘http://www.nk.hel.fi/avoindata/avoin_data-avustukset.xls’) or real-time, then you can create the ETL (Extract Transform and Load) process to ingest the data.
- Open Spoon via a Terminal emulator (search the Icon at the bottom of the screen or click on ‘Application’ menu > ‘Terminal Emulator’), then write the command ‘spoon.sh’ and click ‘enter’.
Fig. –Virtual Machine (VM) Terminal Emulator (command line).
- The Spoon Interface will open, now it is possible to create/load ETL processes.
Fig.: Spoon Interface.
- To create a new ETL make a double click on the tab ‘jobs’ (or a right click and select ‘New’) in the left menu.
Fig.: Spoon search the menu for a new job.
- Then a new job is created
Fig.: Spoon new job.
- Now you can create your ETL connecting the steps (or blocks) available in the left menu (the connection is realized with the Hops or links).
- Kettle is based on two key concepts (from operating point of view):
- Job (with extension “.kjb”);
- Transformation (with extension “.ktr”), composed of several steps.
- Kettle is based on two key concepts (from operating point of view):
In kettle the data are rows flow from one step to another one. A job manages the sequential execution of lower-level entities: transformations or other jobs.
- Transformation and job can be realized using the steps, that are classified into three main categories:
- Input: process some kind of 'raw' resource (file, database query or system variables) and create an output stream of records from it.
- Output: (the reverse of input steps): accept records, and store them in some external resource (file, database table, etc.).
- Transforming: process input streams and perform particular actions on them (adding new fields/new records); these actions produce one or more output streams.
Fig.: some available Spoon steps.
- To put the step in a job or in a Transformation, the drag and drop functionality is active. The connection is done thank to the hops (link)
- The first step is the ‘Start’ and usually the last is ‘Success’
Fig.: Spoon job and first step: ‘Start’.
Fig.: Spoon job and first step: Job, Transformation, hops.
- To save a job or a transformation;
- Right click on the related icon
- Change the job/transformation name
- Select the directory into which the file is saved (is an xml file)
- The select the file name
Fig.: Spoon job saving.
- To connect to a web server a Spoon step is present:
- In a job, it is present in the menu ‘file management > HTTP’ or can be retrieved using the text search filter ‘Entries’.
Fig.: Spoon HTTP step in a job.
- In a Transformation, it is present in the menu ‘Lookup > HTTP Client’ (or HTTP Post), or can be retrieved using the text search filter ‘Entries’.
Fig.: Spoon HTTP step in a transformation.
SAMPLE: Helsinki_youth_subsidies (Static: the results are always the same or the same for a long period)
- A set of samples are already present in the VM, for example we can choose the ‘Helsinki_youth_subsidies’ that is capable to: i) download a dataset; ii) transform it; iii) save the new data in a file (into the VM file system). To verify the ETL functionalities follow the steps:
- Click on the menu and select: ‘File > Open’ and select the file from: ‘Ubuntu/Desktop/snap4cityETL/Helsinki_youth_subsidies_XLS/Ingestion/Maink.kjb’
- In the Spoon Interface the Main Spoon Job appears.
Fig.: Spoon Interface: open the ‘Helsinki_youth_subsidies_XLS’ ETL.
Fig.: Spoon Interface: the ‘Helsinki_youth_subsidies_XLS’ ETL opened.
- To run the ETL process, click on the main ‘Run’ button (), a popup will open with a set of default values (including the url from which to download the dataset: ‘’), click on the Second ‘Run’ button.
Fig.: Spoon Interface: launch an ETL (‘Run button’).
- Once launched the process, a set of windows will open. One relevant is the ‘Logging window’ from which it is possible to read the logs. When the logging windows report the sentence ‘‘Spoon – job has ended’, the job is ended with no errors.
Fig.: Spoon Interface: ETL executes, logging details.
- Open the ‘HTTP’ step to view how the connection with the web server it is realized:
- Click on the ‘HTTP’ Icon (right click) and select ‘edit’
Fig.: ETL execution: edit the HTTP step.
- In the URL field is present the direct url to download the dataset (or a parameter that can come from the job settings, from the Database, from a configuration file, etc.)
- In the Target file field is present the file path into which the data coming from the web server are written
Fig.: ETL execution: HTTP step details.
- What are the actions done by this process and how it is possible to verify them? Some block, can be expanded, with double left click or 1 right click of the mouse.
- Activities:
- block ‘Start’: start the process (see its details with: ‘right click > edit’),
- block ‘Create original data folder’:
- if not exists, creates a new folder for the downloaded file: ‘‘/home/ubuntu/Desktop/ETL_Output/Helsinki_youth_subsidies_XLS/Original_data’
- block ‘HTTP’: makes the HTTP REST request (to ‘http://www.nk.hel.fi/avoindata/avoin_data-avustukset.xls’) and saves the xls file (named ‘avoin_data-avustukset.xls’) in the folder created in the previous block
- block ‘File Exists’: checks if the uploaded file is already present
- block ‘Create Elaborated data folder’: ‘‘/home/ubuntu/Desktop/ETL_Output/Helsinki_youth_subsidies_XLS/Elaborated_data’’
- block ‘DUMMY’: do nothing
- block ‘process data’: writes the downloaded file, made data analysis and save the result in a new file in the folder ‘‘/home/ubuntu/Desktop/ETL_Output/Helsinki_youth_subsidies_XLS/Elaborated_data/Result_data.csv’
- block ‘Success’: ends the process.
- How to verify:
- Search the folder into the file system and verify the presence of the files, you can also delete them, and launch again the ETL process.
- Activities: