Test Case Title |
TC6.5 - Managing Heterogeneous File Ingestion via ETL processes |
Goal |
I can: Ingest any kind of file format from ETL processes |
Prerequisites |
Using a PC or Mobile with a web browser. Using ETL processes. The following functionalities are available only for specific Snap4city users with specific privileges. |
Expected successful result |
Generating an ETL process or listing the capabilities and putting in execution the examples provide into the development kit accessible via VNC or in the VM installed on premise. |
Steps |
Please note that to correctly perform this Test Case you need and access to the ETL Virtual Machine as described below. To have access to a Virtual Machine to perform ETL please contact snap4city@disit.org. |
Snap4City platform gathers information of the city from several sources, via data driven, stream, sporadic and/or periodic processes (ETL, Java or any other kind). Moreover, it is capable to convert the information for the Knowledge Base making it reconciled for geospatial query via ETL processes. In this case, once a file is imported (downloaded, for example) it has to be ingested and its data are mined and stored into the Hbase for versioning, reconciliation and quality improvement. Several different formats and structure can be addressed by creating specific ETL processes for each data source family. Several examples of the usage of ETL processes are accessible on DISIT lab GITHUB, that have been developed and are in place now to manage the data ingestion for the Smart City of Florence and Tuscany.
See ETL Smart City examples: https://github.com/disit/smart-city-etl
In effect, the ETL processes are used for data gathering of both static and real-time data collecting files from HTTP/FTP protocols, such as: data from traffic sensors, parking lots, weather forecast, cost of fuels, environmental data, etc. Then the mined data are stored into noSQL data base such as: Hbase/Phoenix or Mongo storage and/or RDF storage for the Knowledge Base in triples. This allows exploiting the data for data analytic, dashboards, etc. You can start testing this requirement by following the instructions described for the ProcessLoader.
The snap4city ETL process can upload, transform and manage data: such as downloading a file from an external data source, extracting its contents and save in a database or in the file system, etc.
Examples of sources and data ingestion processes by ETL and Snap4City Applications are:
- OpenStreetMap: from OSM to Knowledge Base, also integrating civic number location from any Buyer data base, see also the process described into manual: From the Open Street Map to the Km4City street graph.pdf, loaded into the Google Drive.
- To test click on the link to see live examples:
- Antwerp: https://antwerp.snap4city.org/
- Helsinki: https://helsinki.snap4city.org/
- Tuscany: https://servicemap.km4city.org/
- Sardinia, Emilia Romagna, Veneto: https://www.disit.org/smosm/
- GTFS data about Public Transportation schedule, stops, paths, etc.;
- To test click on the link to see a live example: see ETL table below
- DATEX information about the ITS of the city, events;
- see ETL table below
- Crawling public web pages for collecting additional information;
- For example: the crawling of Hospital Triage. see ETL table below
- Open Data of the city;
- To test click on the link to see a live example: see ETL table below
- Parking status;
- To test click on the link to see a live example: see ETL table below
- Twitter data from Twitter.com directly and/or from TwitterVigilance;
- To test click on the link to see a live example:
- XML, HTML, JSON; CSV, WSDL, XLS formats
- See the table below containing links to live examples
- data accessible as External Services registered on the platform on the MicroService Directory; (not yet available). Presently they are directly ingested by calling the services with their protocols.
- SigFox gateways with Node.js and wrappers, or ETL, etc.
- Lora gateways with Node.js and wrappers, or ETL, etc.
- Any other format and protocol can be easily added
The following ETL files are included in the ETL zip file (see link above):
Source: |
ETL Description |
|
Florence_firstAid_accesses_HTML |
This ETL (static)
|
|
Florence_Parking_JSON (static & realTime) |
This ETL is composed of two phases: STATIC phase:
REAL TIME phase:
|
|
Florence_Weather_XML |
Arpat, Tuscany region |
This ETL (REAL TIME):
|
Florence_Pharmacies_CSV |
This ETL (static)
|
|
Helsinki_youth_subsidies_XLS |
This ETL (static)
|
|
Electric_vehicle_charging_kmz |
This ETL is composed of two phases: REAL TIME phase:
|
|
Electric_vehicle_charging_kmz_phoenix |
This ETL is composed of two phases: Static phase:
REAL TIME phase:
|
|
Bike_Sharing_Areas_Shp |
This ETL (static)
|
|
Tpl_bus_gtfs |
This ETL (periodic) |
|
Smartbench |
This ETL (REAL TIME):
|
|
LinkedData |
This ETL (static)
|
|
Florence_School_canteen |
Disit FTP |
This ETL (static)
|
Tuscany_parking |
This ETL (static)
|
|
via_francigena_farmhouse_GeoJson |
This ETL (static)
|
|
sigFOX |
This ETL (REAL TIME):
|
|
From_KM4cityKB_to_Datagate |
This ETL (REAL TIME): |