Road Traffic Accident Prediction by Machine Learning and GIS: Case Study in Thanh Hoa Province, Vietnam
Abstract
Traffic accidents pose significant challenges for communities worldwide, particularly in Vietnam, where many individuals live on low to middle incomes and where infrastructure often struggles to keep pace with rapid mechanization. This study uses a historical data set of traffic accidents from 2020 to 2023 in Thanh Hoa province, Vietnam, as input data for Random Forest and Spline Regression machine learning models to predict the number of deaths and injuries from traffic accidents. A traffic accident prediction map for 2024 is established from the predicted results of the death and injury numbers obtained from Random Forest combined with GIS technology. The prediction results show superiority in providing detailed information about accidents to intervene to make traffic safer, especially in areas at high risk and during peak periods of accidents. The Random Forest model demonstrated superior performance to Spline Regression, achieving a mean absolute error of 0.012072 for deaths and 0.036323 for injuries, with the R2 values of 0.998663 and 0.996552, respectively. Including lagged variables and adjusting for seasonal effects further improved the accuracy of daily predictions. The study offers an approach to solving traffic accidents in low- and middle-income countries, where traffic accident prediction methods based on historical data sources are still not widely used, with the hope of applying machine learning and GIS in road safety management shortly.

