Benchmarking Travel Time and Demand Prediction Methods Using Large-scale Metro Smart Card Data
Urban mass transit systems generate large volumes of data via automated systems established for ticketing, signalling, and other operational processes. This study is motivated by the observation that despite the availability of sophisticated quantitative methods, most public transport operators are constrained in exploiting the information their datasets contain. This paper intends to address this gap in the context of real-time demand and travel time prediction with smart card data. We comparatively benchmark the predictive performance of four quantitative prediction methods: multivariate linear regression (MVLR) and semiparametric regression (SPR) widely used in the econometric literature, and random forest regression (RFR) and support vector machine regression (SVMR) from machine learning. We find that the SVMR and RFR methods are the most accurate in travel flow and travel time prediction, respectively. However, we also find that the SPR technique offers lower computation time at the expense of minor inefficiency in predictive power in comparison with the two machine learning methods.