Data Analytics: Analyzing Public Transportation Offer wrt Mobility Demand


 A deep understating about how public transportation services are exploited is an additional fundamental step for improving the offered services on the basis of the dynamic demand of mobility. On the other hand, to evaluate the impact of changes in the public transportation services offered to the commuters, specific analyses and simulations should be performed to save time and costs. For instance, in a real case, the mobility officer of a city may receive complains about recurrent crowding conditions on a bus line and its specific segments, connecting a number of specific bus stops. The corresponding actions to solve the problem could be to perform some on-site analysis and interview and they implement some changes in the public transportation service in agreement with the operator (changes on-time schedule, paths of the bus lines, frequency of the service, addition of bus lines, etc.). The alternatives and combinations can be many, and thus, to avoid proceeding by try and error, a deep analysis by simulation is vitally required to better understand causes of critical issues and impact of changes before performing physical changes.

Therefore, the simulation and analysis of people flows in the city is conquering a growing attention due to a wide spectrum of related applications (e.g., [36]). In particular, for predicting human usage of bus lines, several approaches are offered. In most of them, the assumption is that the tracking data of travelers, which can be used to model and predict human mobility, is available. For example, in [5], the tracking data of commuters’ area collected by using public transport IC cards. It is assumed that the drop-off probability of passengers at a bus stop follows a uniform or a standard normal distribution (that is quite unrealistic). In other cases, the stops are labeled as “small”, “medium”, and “large” in terms of volume of passenger exchanged. Then, the drop-off probability at a stop is obtained following the labeling strategy, based on the intuition that larger stops may attract more passengers, and as a result, the drop-off probability at such stops may be higher than others. In [3], a methodology for evaluating the quality of stop boarding and alighting has been presented. A part of the research focuses on estimating the alighting stop of a stage in a multiple-stage trip when the associated boarding stop is available. A different approach has been proposed in [6] to estimate real-time passenger flow for urban bus transit systems. In that case, the number of people, with smart card and on-board tickets, who are picked-up at a station is estimated, considering two consecutive taping records. Finally, considering a bus trip, after a real-time estimation of the number of passengers on the bus, the number of on-board passengers on the remainder of the trip stations is estimated, using a proposed Kalman filter. In addition, in order to contextualize the proposed work with respect to the state of the art, a number of tools for simulating people flows have been reviewed including, MatSim [7], SUMO [8], and TRANSIMS [9], just to mention a few. Most of them do not address the estimation of pick-ups and drop-offs at stops as well as the analysis of commuters’ behavior. The other limitation of these tools is the limited capability in considering contextual data regarding the city structure, and thus, the motivations to get in/out of the bus. Thus, most of the above-mentioned solutions assumed: (1) the possibility of tracking passengers (e.g., using public transport IC cards, mobile device tracking data). This fact, however, it is not viable real scenarios. Please note that taping is not mandatory for regular commuters and city common users in most of the modern solutions, and thus relevant errors are produced. The counting of passengers on board may be available and may be performed on busses while the counting of people at the bus stops (with details on drop-off and get-on for bus-line) is typically very expensive and not easy to be performed since they could be waiting for several bus lines, and most of the bus-stops present multiple bus lines; (2) to work on single-stage trips where commuters need to take only one bus to reach their destinations.

In this paper, focusing on the bus as the public transport mode, a model and simulator for the analysis of the offer of public transportation services with respect to the demand of mobility is proposed (called ODA, Offer vs Demand Analyzer). In other words, the proposed model aims at (i) producing viable and consistent results without the need of detailed data on the bus lines; (ii) addressing multi-stage trips, and thus, is in some measure a multimodal simulator and analysis tool for matching demand mobility vs. offer of transportation. The proposed model and tool have been developed in the context of research and development project called MOSAIC founded by Tuscany Region (Italy) with relevant international partners including, ALSTOM (the coordinator), DISIT Lab of UNIFI (us) ( ), Municipia/Engineering, TAGES, CNIT national research center. The model and tool have been built exploiting Km4City knowledge model (, and validated by using the data and services provided by Snap4City ( The input and contextual data are those covering the Tuscany region, and in particular, the Florence City Metropolitan area which is the capital of the region with about 1.5 Million inhabitants.

Requirements and Data Source Analysis

In this section, the main requirements of a simulation tool in the context of matching demand and offer of mobility are discussed. Among the main requirements, the analysis of the data sources assumes a strong relevance because the tool has to be flexible enough to cope with different kind of data sources. Also, the tool must be flexible enough to model the demand starting from (and taking into account) a range of different data that may correlate to; In fact, in urban areas, daily commuters follow different purposes, considering their activities (e.g., work, study). Different points/regions of interest are then needed to be carefully investigated to evaluate the extent of the desirability of commuters for traveling to, considering the contemplated time slot. Therefore, a broad domain of data including places which provide services (work and study, in our scenario) and household data (e.g., residential buildings) must be considered upfront. Moreover, different data regarding daily trips is needed to be assessed, namely, the outbound and inbound trips with different purposes, and inter-area trips. In addition, other static and dynamic data, including the geolocation of the area, daily bus trip schedule (e.g., stop names, stop geometry, arrival times), is vital for efficient offer-demand analysis. Also, in urbanized cities, especially in metropolitan areas with several bus lines and mobility operators, a considerable ratio of daily trips are multi-stage ones, where commuters are needed to change more than a bus to reach their destinations. Therefore, when providing a model to consider the match from service offer and demand, it is also essential to consider such trips to increase the precision of the model. Moreover, the analysis tool has to be fast enough to allow performing a WHAT-IF analysis by carrying out a large number of simulations and choose them to assess on the basis of some Key Performance Indicators, KPI, (e.g., the maximum number of people of the bus, number of people moved from the area, the maximum number of people at the bus stop). It is noted that analyzing different scenarios with different input parameters (e.g., area, date, day time, time slot size), which requires digesting a large amount of data, can be notably a time-consuming process.

ODA System Architecture

In this section, the general architecture of the ODA Model is reported (see Figure 1). The main components of the architecture are data sources, algorithms to transform some data in OD Matrices when possible, the simulator with its algorithms, the integration with Snap4City/Km4City tools via Smart City API, and some visualization tools for presenting the results [12].

Figure 1: The model architecture

 The main input data categories include:

  • Service Demand Data describing people flows in the area and can be gathered from different resources (e.g., WIFI networks, cellular networks, traffic, census). Such data usually is produced by different operators (e.g., mobility, telecommunication).
  • Service Offer Data describing potential people flows in the studied area. Such data can be gathered from different resources including, trip schedule (e.g., stop list, arrival time list, GTFS), stop information (e.g., name, geolocation), route information, just to mention a few. In this work, thanks to various data supported by the Km4City knowledge model, we adopt it as the source for gathering service offer data.
  • Aggregation and Production Motivations for People flows describing points and areas where people may start their trip from or may end their trip there. For example, residential buildings, touristic areas, aggregation points (e.g., offices, shopping areas/centers, universities, schools, factories, cinemas, swimming pools), just to mention a few.

All this kind of data may be converted by specific algorithms in ODMs by Conversion Tools. ODMs describe the number of people who could/would or are moved from an area to another, the specific meaning of the ODM depends on the Data Source, but structurally are substantially similar. ODMs can be composed of a combination of different data resources, each with a (possibly) different share to obtain the ODMs.

The Simulator performs the demand-vs-offer analysis as described in the next section. Please note that analyzing different scenarios with different input data and configuration scenarios parameters (e.g., studied area, date, day time) is a time-consuming process, considering the input data with additional configuration scenarios. For example, with the current Km4City configuration, it takes around  minutes to analyze a single bus trip which passes through  stops in the central part of the Florence metropolitan area. Therefore, considering more than  daily bus trips,  bus stops,  residential buildings, and  service providers, it takes two to three days to thoroughly analyze service offer and demand in the Florence metropolitan area, using the proposed model. To avoid such a situation, we adopt a fast-computational strategy allowing to perform a large number of simulation scenarios, storing the results, and using them for visualization and further analysis. Also, other developed MOSAiC tools can access the results, using the simulator smart city API.

Visualization tools are responsible for presenting customized simulation results in a smart city dashboard (as in Snap4City The Result manager performs the analysis of the results to be visualized in the dashboard based on the criteria that are selected by the user. 


Model Testing and Validation

Table 2 presents the model input setup. The number of the morning  and the afternoon  outbound trips are respectively obtained from the morning and afternoon demand ODMs, considering   localities in the Tuscany region. The radius   in the circle  , which is experimentally selected to get the best results, is set to  . It is worth noting that, to provide more acceptable and realistic results, our model has been tested and validated in the context of typical working days (i.e., neither holidays nor weekend). Therefore, in this case, we focus on the census data since it is the main source of moving people around the city. The census data are publicly available on the Italian National Institute of Statistics ( and the Region Tuscany digital portal ( The metropolitan area is served by more than   different public transportation operators for more than   millions of inhabitants/residents,   daily trips with the purpose of work or study,   millions of tourists per year,   vehicles daily entering and an equal number of those exiting from the city, and around   inhabitants in the central part of the city in which busses are massively deployed. 











Figure 9: The area, bus stops, and bus lines, considered for testing and validating the model

The model components (implemented in Java) ran on a PC with Intel Xeon  CPU   GHz and   GB RAM. For model validation, we consider four popular stops in the center area of Florence (see Figure 9) which includes a considerable number of POIs, residential buildings, bus stops, bus lines, and bus trips (i.e., instances of bus lines). To validate the proposed model a field observation was performed in four different time intervals, both in the morning and in the afternoon. Table 3, which shows different criteria to evaluate the complexity of the validation process. As one can see, the proposed model was suitably analyzed when it was tested and validated, with respect to each considered criterion.

According to our model and tool, Figure 10 shows the comparison of the actual number vs the computed number of pick-ups and drop-offs at four selected stops including, Santa Maria Maggiore, Santo Spirito, Verdi, Porta Rossa, in four different time intervals. In those experiments, the model accuracy was evaluated based on R square. The results demonstrated that, considering the R-square values for pick-ups and drop-offs  and  , respectively), that the model could provide satisfactory contribution to offer-demand analysis problems in public transport scenarios.

Figure 10: Actual (black bars) vs computed (gray bars) of Pick-ups and drop-offs, respectively, according to the field observation and the proposed model.


Providing public transport services with suitable quality is an essential challenge in urban environments. An important step is the evaluation of a mass transport network by comparing the service offer and demand. To address this step, in this work, a model is provided to analyze transport scenarios when daily commuter tracking data is not available. To do so, first, the service offer is evaluated by estimating the number of people who can be moved from a locality to another. Next, service demand is analyzed by evaluating: 1) the number of people who are daily moved from a locality to another with the purpose of work or study; and 2) stop popularity. Finally, to compare the service offer and the service demand, the number of people who are picked-ups and dropped-off at stops. The proposed solution can be adopted to analyze the status of the public transportation services and to detect potential issues (e.g., overloaded bus stops and bus trips) in case of changes (e.g., blocked stops, out of service bus trips) that may emerge. Our work leaves space for future research and developments. In particular, in this work, it is focused on the bus as the mode of transportation. This observation is consistent with our simulation experiment because in the Florence metropolitan area, at the time of writing of this paper, there are only two tram lines and no subway service. Therefore, the bus can be considered as the main mode of public transportation. An interesting alternative can be investigating multi-modal scenarios by considering other public transportation (e.g., tram, subway) or even private (e.g., taxi) modes. Also, considering other metrics for the analysis (e.g., headway, the number of on-board commuters) can be a great source for future work in which it can be used in What-IF analysis.


The authors would like to thank the MOSAIC project of Tuscany Region, and also all the partners involved for their support and partial funding (ALSTOM, Municipia/Engineering, Tages, CNIT), Snap4City, and Km4City are open technologies of DISIT Lab.


  List of all scenarious:

[1] M. Drut. 2018. Spatial issues revisited: The role of shared transportation modes. Transp. Policy, vol. 66, pp. 85–95.

[2] S. Jain, P. Aggarwal, P. Kumar, S. Singhal, and P. Sharma. 2014. Identifying public preferences using multi-criteria decision making for assessing the shift of urban commuters from private to public transport: A case study of Delhi,” Transp. Res. Part F Traffic Psychol. Behav., vol. 24, pp. 60–70.

[3] M. Arnone, T. Delmastro, G. Giacosa, M. Paoletti, and P. Villata. 2016. The Potential of E-ticketing for Public Transport Planning: The Piedmont Region Case Study. Transp. Res. Procedia, vol. 18, pp. 3–10.

[4] C. Wang and H. Xuan. 2006. A Fair Off-line Electronic Cash Scheme Based on RSA Partially Blind Signature. First International Symposium on Pervasive Computing and Applications, 2006, pp. 508–512.

[5] S. Shang, D. Guo, J. Liu, and K. Liu. 2014. Human Mobility Prediction and Unobstructed Route Planning in Public Transport Networks. IEEE 15th International Conference on Mobile Data Management, 2014, vol. 2, pp. 43–48.

[6] J. Zhang, D. Shen, L. Tu, F. Zhang, C. Xu, Y. Wang, C. Tian. 2017. A Real-Time Passenger Flow Estimation and Prediction Method for Urban Bus Transit Systems. IEEE Trans. Intell. Transp. Syst., vol. 18, pp. 3168–3178, Nov.

[7] A. Horni, K. Nagel, and K. W. Axhausen. 2016. The multi-agent transport simulation MATSim. Ubiquity Press London.

[8] D. Krajzewicz, J. Erdmann, M. Behrisch, and L. Bieker-Walz. 2012. Recent Development and Applications of SUMO - Simulation of Urban MObility. Int. J. Adv. Syst. Meas., vol. 3&4, pp. 128–138.

[9] L. Smith, R. Beckman, D. Anson, K. Nagel, M. Williams. 1995. TRANSIMS: Transportation analysis and simulation system. Los Alamos National Laboratory, United States.

[10] K. Sohn and D. Kim. 2008. Dynamic Origin-Destination Flow Estimation Using Cellular Communication System. IEEE Trans. Veh. Technol., vol. 57, pp. 2703–2713.

[11] P. Bellini, D. Cenni, P. Nesi, and I. Paoli. 2017. “Wi-Fi based city users’ behaviour analysis for smart city,” J. Vis. Lang. Comput., vol. 42, pp. 31–45, 2017.

[12] C. Badii, P. Bellini, P. Nesi, M. Paolucci. 2018. A smart city development kit for designing web and mobile app. IEEE SmartWorld, 28 June 2018. 10.1109/UIC-ATC.2017.8397569.

[13] P. Bellini, S. Bilotta, P. Nesi, M. Paolucci, M. Soderi. 2018. Real-Time Traffic Estimation of Unmonitored Roads. Conference: IEEE-DataCom'2018, Athen. 10.1109/DASC/PiCom/DataCom/CyberSciTec.2018.000-6.

A complete version of this paper has been publiched and presented on ACM TESCA: Ala Arman, Pierfrancesco Bellini, Paolo Nesi, Michela Paolucci, "Analyzing Public Transportation Offer wrt Mobility Demand", ACM Workshop on Technology Enablers and Innovative Applications for Smart Cities and Communities (TESCA 2019), November 10, 2019 at Columbia University, New York, USA.