TC6.4 - Managing ETL processes via Resource Manager, upload, execute, monitor

Primary tabs

Test Case Title

TC6.4 - Managing ETL processes via Resource Manager, upload, execute, monitor

Goal

I can

Upload a finalized ETL or Data Analytics process on Resource Manager according to a predefined format, sharing processes to other stakeholders.

Put them in execution, on demand or periodically, via Resource Manager and on the back office via DISCES.

Monitor the execution of a process observing its status and success story of executions, via Resource Manager.

Get results from the process executed.

Put them in execution, on demand or periodically, on the back office via DISCES.

Monitor the execution of the process observing its status and success story of executions, via Resource Manager from web.

Get results from the process executed from web.

Monitor their execution on DISCES.

Prerequisites

Using a PC or Mobile with a web browser. Conquer a minimal skill on producing ETL processes. See provided user manuals.

Access to the sandbox by using VNC. Using Penthao Kettle installed on Snap4City Cloud. Download VNC if needed from the Snap4City https://www.snap4city.org/downloads .

Download the Virtual Machine (VM) with development kit. Download an ETL of example. Using Penthao Kettle installed on your PC.

The following functionalities are available only for specific Snap4city users with specific privileges.

Expected successful result

Management of resource: load, share, browser, select and download for sharing and reusing resources as processes, dashboard, applications.

Steps

 

 

Please note that to correctly perform this Test Case you need and access to the ETL Virtual Machine as described below. To have access to a Virtual Machine to perform ETL please contact snap4city@disit.org.

 

Users roles for which this test case is available: Area Manager, ToolAdmin, RootAdmin

Users roles for which the test case is not available: Manager

User used to describe the Example: ‘snap4city

Data processes developed in ETL or with other data analytics tools (R, Java, Phython, etc.) can be used to automatically:

  • Collect data from external sources, including external services of third parties.
  • Perform data transformation and analytics
  • Produce data to external location.

The referential data could be produced from:

  • external sources via ETL, for example: traffic flows, parking status, etc.;
  • real-time data arriving at the infrastructure, via IOT Brokers for example, sensors, actuators, etc.;
  • results from Data Analytics processes in: ETL, C/C++, java, Python, etc.;
  • collective traces from mobile applications, via Smart City API
  • crowd sourcing from the city users: votes, ranks, images, etc.;
  • social media data: for example: Facebook, tweet.com;
  • events in the city: entertainment, traffic, accidents, etc.;
  • Knowledge Base of the smart city;
  • DataGate: producing data from those collected;
  • Advanced Smart City API: producing data from mobile usage;
  • results from Snap4City Applications;
  • Log collected by the EventLogger

 


Example 1: Upload a finalized ETL or Data Analytics and Put it in execution, on demand or periodically, on the back office via DISCES via the Resource Manager.

 


Fig.  b – Resource list and creation of a new resource in the Resource Manager.

 

  • If you click the ‘Upload New Resource’ button, a pop-up appears


Fig. c – Upload a new Resource on the Resource Manager.
 

  • To schedule your ETL you must:
    1. Create a Process Model associated to the ETL
    2. Create a New Instance of the process Model (related to your ETL) and schedule the instance 

1) How to Create a Process Model associated to the ETL:


Fig. d – Upload a new Resource on the Resource Manager.

 

  • Search for the ETL you want to schedule:
    • search for example the ‘Electric vehicle charging’, using the filter in the right top of the web page
    • select a Resource (for example: ‘Electric vehicle charging’)
    • click on the ‘NEW’ button (in the ‘Process Model’ column of the table) and put the metadata. For the ‘Electric vehicle charging’ ETL, it is possible to put:
      • ‘Process Parameters’ tab:
        • Name= ’Electric vehicle charging’
        • Description = ‘info on Electric vehicle charging in Florence’
        • Group = ‘Services’
      • ‘Trigger’ tab:
        • Name= ’Electric vehicle charging_trigger’
        • Description = ‘description…’
        • Group = ‘Services_trigger
      • ‘Advanced Parameters’ tab:
        • Needed if it is necessary to execute two different process instances in series (typically is void).


Fig. e – Upload a new Resource on the Resource Manager.

 


Fig. f: New Process Model: Parameters

 


Fig. g: New Process Model: Advanced parameters

 


Fig: New Process Model: Trigger

 

2) How to create a New Instance of the process Model (related to your ETL) and schedule it:

To put in execution a Process, at least one Process Instance (associated to the Process Model created in the previous Paragraph) must be created. The instance can then be executed in a scheduler. To do this, follow the steps:

  • Click on the ‘Process Model’ menu and search for a specific Process Model (e.g. the Process Model associated to the ‘Electric vehicle charging’ Resource). 


Fig. h: Process Models and new Instances.

  •  Click on the ‘NEW’ Process Instance button (column ‘New Instance’ of the table) put the necessary metadata and create it (Click the ‘Confirm’ button).


Fig. i: Create a new Instance of a Process Model (related for example to an ETL).

  • Click on the ‘VIEW’ button (‘Show Instances’ column of the table) to verify its presence


Fig. l: Verify the instance creation.

 

  • Now the process is in execution and you can find it in the list (tab ‘Processes in Execution’)


Fig. m: Processes in Execution list.

  • Once your process is in execution, you can monitor its status:
    • From the ‘Process in execution List page’, it possible (see the figure above):
      • Do actions on its execution:
        •        See the executions log

 

Fig. n: Processes in Execution details.

  •      Start the (or restart) the process execution
  •      Stop the process execution
  •       Delete the process execution
  • See the process execution from the scheduler (called DISCES) view: click on the link ‘Test Scheduler Node’ and the following view appears

 


Fig. o: Processes details.

 

  • It is possible to monitor the job from the home page:
    • Click on the Filter button (at the bottom of the page)
    • Make textual searches on each column


Fig. p: Processes in Execution search.

Note that "Status" value of a newly created job is set to "CREATED" and it is changed each time the user starts or stops running a job from the commands on the right side of the line. The content of the "status" column is continually updated by means of a function that every two minutes sends to the various schedulers in which requests processes are entered to know the execution state and depending on the response received, the value is updated accordingly.
 
The values ​​that the Status can assume are: 

  • CREATED: is the value that defaults to a process right after it was created, before it is updated for the first time
  • NORMAL: The process is running correctly.
  • NONE: Trigger associated to the process still not exist (per example the execution time interval is ended)
  • BLOCKED: The execution of the process was blocked by the user.
  • PAUSED: The execution of the process has been paused.
  • ERROR SERVER COMMUNICATION: The request was successfully sent, but internal issues for the scheduler server could not be received.
  • NOT FOUND: The application sent a request to the scheduler for that process, but did not find any corresponding process to the information sent.
  • RUNNING: The process assumes this value immediately after a start execution request has been sent.

 


Example 2: To Put the ETL in execution DIRECTLY from the Virtual Machine used to realize on the Resource Manager.

  • See the TC6.3 to Create an ETL (Open the VM from the snap4city home page)
  • Go to the snap4city home page
  • Make the login
  • If the user has the developer permissions (as provided for the ‘AreaManager’), he/she finds the menu ‘Development Tools > ETL development’
  • Click on the menu and insert the correct password (the same credentials two times, Fig q, r)


Fig.q1: vnc connection.