HOW TO: Perform Fast Data Loading in Python (FastDataLoader)

In order to fast loading data from some storage, database, repository to the NGSI V2 broker of Snap4City it is possible to use a python script. 

Fast data loading (FastDataLoader in Python) can easily reach 

  • ​up to 8400 Entity Messages with 10 variables, plus GPS, etc., per minute on Orion Broker, in authenticated/Authorized manner
  • which are 12 Million of messages per day, 4.4 Billion of messages per year

The python script can be used from:

 

Configuration file

The configuration file must be called “conf.json” and must be saved in a folder called “data”. The fields in the file that must be filled in are:

  • kill: is the number of total failures that that lead the program to stop forcibly;
  • toll: is the number of failure in the last 100 insertions that lead the program to stop forcibly;
  • dataFolder: name of the folder containing the data to be processed;
  • threadNumber: number of threads that are spawned to perform insert operations on devices;
  • username and password: credentials of the account that contains the devices;
  • refreshTime: number of minutes before the token is refreshed
  • clientID: provided by the system administrator;
  • clientSecret:  provided by the system administrator;
  • token url:  provided by the system administrator;
  • patch url: provided by the system administrator;
  • mapping: contains the parameters needed by the device and the path to access the data in the json, using the $ as separator and the % as array separator. The data must be saved in an array inside a json file.

Figure 1: Config example

Parser

The parser is the function that transforms the raw data of the sensor into data that can be inserted into the device. The function that parses the data must take the configuration file as input from which it can extract the name of the folder containing the data.  If you don’t want to change the code of localScript.py you can use the names:

  • parserFunc.py for the parser program;
  • parser for the class contained in the program;
  • parse for the method that do the parsing.

The function have to return a list of all the parsed data as in figure 2 in JSON format.

Examples

Figure 2: Row data

 

Figure 3: Parsed data

Execute

Once the configuration file and the parser function have been created and placed correctly in the same folder as localScript, the program can be launched.  If everything is in the right place,  the program should return a string like: “Running on http://127.0.0.1:8080”.

By opening the link on a browser and adding /scriptBello next to it, the program will begin to process and insert the data into the devices. (The link to insert in the browser will look like this: http://127.0.0.1:8080/scriptBello) Below is an example of what running the program should look like if there were no errors.

Figure 4: Execution exemple

 

Output

When the program is launched, it will start converting the raw data into the parser. Once this is done, the program should print the access token. If an error is returned, check the credentials entered in the “token” section of the configuration file.

Once the token has been generated, the data loading will start and a new access token will be printed every 3 minutes (or the number of minutes written in “refreshTime” in the config file).

Once the script has finished running, the output will be a list of all messages that were not inserted due to errors and the corresponding error, like in the following exemple:

{"JSON":

      {"id": "METRO767", "type": "traffic", "anomalyLevel": {"type": "float", "value": 100.4045},

    "averageSpeed": {"type": "float", "value": 20.390244}, "avgTime": {"type": "float", "value": 13.76345},

    "concentration": {"type": "float", "value": 6.4736843},

    "congestionLevel": {"type": "float", "value": 117.63632},

    "dateObserved": {"type": "string", "value": "2022-04-07T19:51:00.000Z"},

    "vehicleFlow": {"type": "float", "value": 264.0}

},

"error": "500 Server Error:   for URL: https://broker1.snap4city.org:8080/v2/*******"}

 

Errors

All errors occurred during data entry are recorded in the fail.txt file which is automatically generated during program execution. The file reports the json related to the failed insert followed by the error that caused the failure.

if you get an error like ”JWT: ’module’ object has no attribute ’encode’”, you have to use the commands:

  • pip uninstall JWT
  • pip uninstall PyJWT
  • pip install PyJWT

 

Use from command line

To run the script from the command line you need to have a python version installed on your pc. To launch the script, navigate to the folder where it is saved and write for example ”python localscript.py”. Then follow the Instructions  discussed in “Execute”.

 

Use on Snap4City

To use the script on Snap4City you must first rename localScript.py to daScript.py. Then you have to include the script and the parser  in a .zip file. Once the zip has been created, it must be inserted in the ”python-data-analytics” block and write in ”Relative URI”: ”/scriptBello” and then click the button “create python data analithyc” and then deploy.

This is how the configuration should look like.

Figure 5: Node-red flow

In the block “function” must be entered the following code:

Figure 6: function node content

 

The contents of the configuration file must be entered in the config block, as shown in figure 6. This example relates to the configuration file for loading Metro data.

Figure 7: Config node content

 

Compared to the configuration file of the previous version, there are other values to configure:

  • ParserCode: code relating to the parser to be used among those inserted in parserFunc;
  • from_date and to_date: specify the period of time (in days) to send with the program. The days are taken from the folder specified in dataFolder;

The data must be previously uploaded via FileZilla.

In order to download data from FileZilla into the container where the script is executed, it is necessary to add some code in the parser function.

 This code must connect to FileZilla and download the data in a specially created folder, as seen in lines 9 and 10 of figure 8. The first 2 lines are used to connect via FTP, while line 4 to enter the directory in FileZilla where the data has been entered. From line 12 onwards, a loop is performed to explore the folder in FileZilla and download only the files of the days ranging from "from_date" to "to_date". The code in figure 8 is specific for the data detected by the Metro sensors, this code does not work with data saved in folders following a different scheme from that of the Metro sensors.

Figure 8: FTP configuration and use