The Data Processing Development Environment (VM) can be used for developing Snap4City Applications, ETL processes, Data Analytics Processes. It is a personal or shared SandBox for Developers that may use it:
- from the cloud via WEB VNC access from the www.snap4city.org menu on left.
- on their own computer and desktop downloading the VM and executing it with a VMware player. The users may develop Snap4City Applications directly connecting to their own environment.
When data transformations have to be developed for creating new services, we suggest to use ETL and not IOT Applications. The ETL can be developed with the approach described in this document. This approach of direct access to the VM on cloud results to be the most powerful solution and environment for developers. Thus, the developers acceding to the VM for developers can find an integrated environment in which they may develop:
- ETL processes as above described for data transformation, via ETL developer Tools as Penthao Kettle, for data transformation which is capable to cope with any kind of protocols and formats; In addition, may smart city examples are also provided on GITHUB/DISIT and from Resource Manager: https://processloader.snap4city.org/
- Karma for developing XML automated mapping, for example for passing from data in MySQL table to RDF triples;
- R stat, Python, java, for data analytics development, via installed in local
- NodeRED, for Snap4City Application development, missing IOT directory, Snap4City MicroServices, also available in the VM while we suggest you use directly as IOT application of Snap4City
- DISCES tool for local scheduling and test. While the real DISCES of the Snap4City back office is accessible only for ToolAdmin. Your process loaded into the local DISCES can be moved by the adimistrator when needed, under your request, for example.
Once a process is developed it can be tested as scheduled process by using a local DISCES (a local stand alone instance of the Smart Cloud Scheduler). Any process, one tested in local can be loaded on the ProcessLoader to be submitted. So that it can be approved and put in execution in the back office on the real DISCES automatically.
Please note that the following links could be accessible only for registered users.
On your premise:
VM for download to be put in execution via VMware player.
User Manual to download:
- https://www.snap4city.org/download/video/ETL_and_Console_of_the_Virtual_Machine_-_User_Manual.pdf (in ENG)
- Quick guide to VM Snap4City: https://www.snap4city.org/download/video/Snap4city_VM_Quick_guide.pdf
- https://www.disit.org/7107 (external link - In Italian)
- https://www.disit.org/6690 (external link - In Italian)
- https://www.pentaho.com/
- https://wiki.pentaho.com/display/EAI/Pentaho+Data+Integration+Steps
Source code is included into the VM, while the ETL source code is accessible on:
DOCUMENTAZIONE precedente ed aggiuntiva
- Sii-Mobility: DE4.2a-Sistema di acquisizione e Aggregazione dati, dal concetto al dato, dal dato al database con ETL, e dal database al modello ontologico (ITA, ENG)
- GUIDA alla programmazione: Programmazione ETL per Data Warehouse (ITA)
- manuale utente per la creazione di ETL per dati statici e dinamici
- Testi consigliati
- Pentaho Data Integration 4 Cookbook - PACKT Publishing (A. S. Pulvirenti, M. C. Roldàn)
- Pentaho Kettle Solutions - Wiley (M. Casters, R. Bouman, J. van Dongen)
VMSDETL, con Linux Ubuntu 14.04 (root: ubuntu, password: ubuntu)
- questo è il LINK alla macchina virtuale (versione 0.7, 28-02-2017), da scaricare e decomprimere in una directory, include Karma
- manuale utente per la creazione di ETL per dati statici e dinamici
- puo' essere messa in esecuzione con VMware player o workstation
- una volta avviata la VM
- usare come credenziali: root: ubuntu, password: ubuntu
- x modificare l'IP della VM, usate network-admin, o settings, per esempio se la VM non va in rete.
- x avviare servizi/applicazioni necessari si veda istruzioni di seguito
La VM contiene un sistema di sviluppo preparato con i segunti tool che sono elencati per vostra conoscenza, ma che non devono essere installati, ma in certi casi avviati:
- Oracle Java 7 JDK (requisito per Penthao Data Integration e per Apache HBase)
- http://www.oracle.com/technetwork/java/javase/downloads/index.html(link Oracle)
- https://help.ubuntu.com/community/Java (link Ubuntu)
- Penthao Data Integration (PDI) ver. 5.0.1 (tool ETL)
- http://sourceforge.net/projects/pentaho/files/Data%20Integration/
- Avvio dalla cartella data-integration con il comando "./spoon.sh ."
- XAMPP (Database MySQL)
- http://wiki.ubuntu-it.org/Server/Xampp
- Avvio con il comando sudo /opt/lampp/lampp start da lanciare da shell.
- Arresto con il comando sudo /opt/lampp/lampp stop da lanciare da shell.
- Accesso da PDI con username=disit e password=ubuntu .
- Apache HBase ver. 0.90.5 (Database NoSQL), in uso come stand alone
- https://archive.apache.org/dist/hbase/hbase-0.90.5/
- Avvio con il comando start-hbase.sh da lanciare da shell una volta dentro la cartella /bin.
- Arresto con il comando stop-hbase.sh da lanciare da shell una volta dentro la cartella /bin.
- Verifica dell'esecuzione con il comando jps da shell.
- Verifica dell'esecuzione da interfaccia web con accesso a http://localhost:60010/master.jsp .
- h-rider ver. (tool opzionale di visualizzazione/manipolazione dei dati memorizzati su HBase), noSQL database per big data
- Karma data integration ver. 2.024 (necessario per la fase di triplification)
- https://github.com/usc-isi-i2/Web-Karma/wiki
- Avvio con il comando mvn -Djetty.port=9999 jetty:run dalla cartella /programs/Web-Karma-master/karma-web.
- Accesso da interfaccia web http://localhost:9999.