Test Case Title |
TC7.5 - Developing Data Analytics Processes |
Goal |
As AreaManager or higher level user, I can: Develop and/or modifying data analytics processes Develop a new Data Analytics Processes in R, Java, Python, etc. Using: external services, direct access to data store, Advanced Smart City API. This can be performed: (i) using the VM provided, downloading it, putting in execution and developing; (ii) accessing via web to R Studio. Derive the correlations among data, activating on the collected data sets a set of algorithms that may identify correlations, anomalies, etc. and thus performing auto tuning among the families of machine learning and statistic algorithm to arrive at the application of the those that provide the best performance in term of precision, AIC, ELBO, etc… according to the methods and context. Analyse data for correlation, etc. Get results from the process executed. |
Prerequisites |
Using a PC or Mobile with a web browser. Conquer a minimal skill on producing R programme. Get the R example provided. Upload, modify and run the example. From the R studio remote access with credentials. From the R studio it is possible to use direct access to Data Store, and or to use the Smart City API. Create the final package and upload on the ProcessLoader for execution. The following functionalities are available only for specific Snap4city users with specific privileges. |
Expected successful result |
Collect data from the Data Stores, more than one data set. Perform the correlation, creating correlation matrices, estimate the descriptive statistics, produce prediction based on ARIMA as AUTOARIMA for the best model identification, produce the graphics for data trends, comparing trends, etc. |
Steps |
Please note that to correctly perform this Test Case you need and access to the R Studio Virtual Machine as described below. To have access to a Virtual Machine to perform R Studio please contact snap4city@disit.org. |
- Go to the R-Studio portal following the link: https://rstudio.snap4city.org
- Sign In on R-Studio
- Access on the directories inside R-Studio:
- Click on the ‘Snap4City’ directory to access on the directory that contains the R scripts
- Click on the ‘Snap4CityStatistics’ directory to visualize all the R scripts
Fig: Directories on R-Studio.
Please note that the above picture reports the interface clean since we left the condition cleaned. While if you enter in the account after your colleague you risk to find the status of the previous operations. In this latter case, please do what we reported ignoring the condition of the windows that you may find, and the solution will work anyway.
- Click on the ‘Function.R’ script to open it on the Source pane, on the up-right corner of the window
Fig: R Scripts inside the ‘Snap4CityStatistics’ directory.
Fig: ‘Function.R’ Script opened on the Source pane.
- Running of the R code lines - on the Code panel, on the top left of the window, is reported the performed steps:
- Select the ‘STEP 1’ code lines and click on Run: with the STEP 1, all the required libraries is loaded inside R
Fig: ‘Function.R’ Script opened on the Source pane: STEP 1
-
Select the ‘STEP 2’ code lines and click on Run: with the STEP 2, a SPARQL query is executed to retrieve traffic flow data inside R
-
Select the ‘STEP 3’ code lines and click on Run: with the STEP 3, a SPARQL query is executed to retrieve car parks data inside R
-
Select the ‘STEP 4’ code lines and click on Run: with the STEP 4, the data retrieve before is integrated and joined in to a single dataset
-
Select the ‘STEP 5’ code lines and click on Run: with the STEP 5, all the statistical analysis and predictions is executed. Note that the analysis is completed when “STEP 5 COMPLETED - STATISTICAL ANALYSIS PERFORMED” is displayed on the Code panel
Fig: ‘Function.R’ Script opened on the Source pane: STEPS 2 to STEP 5
Fig: ‘ STEP 5 COMPLETED - STATISTICAL ANALYSIS PERFORMED’ message on the code pane
- Statistical Analysis Results visualization:
- Click on ‘Snap4City’ to go back to the principal directory
Fig: From the ‘Snap4CityStatistics’ directory to the ‘Snap4City’ directory
- Click on the ‘StatisticsOutput’ directory, inside the ‘Snap4City’ directory, to visualize statistical analysis results and trend graphs as .png files
Fig: ‘StatisticsOutput’ directory with the statistical analysis results
- Click on each .png file contained into the ‘StatisticsOutput’ directory, to visualize statistical analysis results and trend graphs: a new tab is opened for each file
Note that, into the ‘StatisticsOutput’ directory the .csv format are also contained but not reported in the figures f, g, h. It is possible export and save them, checking on the respective file’s box