Evaluation of Machine Learning Models for Predicting Flood Susceptibility Using Spatial and Socio-Environmental Attributes in Lagos, Nigeria
Zainab Akinsemoyin
Georgia Southern University Applied Geography, Georgia.
Oluwaseun Peter Adeoye
Department of Civil Engineering, University of Ibadan, Nigeria.
Chidiebere Anastacia Ezeh
North Dakota State University, USA.
Eniola Onatayo
Department of Environmental Resources Engineering, State University of New York College of Environmental Science and Forestry, USA.
Oghogho Favour Aisosa
Environmental Management and Toxicology, University of Benin, Nigeria.
Confidence Adimchi Chinonyerem *
Abia State Polytechnic, Nigeria.
*Author to whom correspondence should be addressed.
Abstract
Flooding is one of the major challenges to urban resilience in Lagos, Nigeria, which is one of the fast-growing coastal megacities globally experiencing high exposure to extreme rainfall and sea-level rise. In this paper, the performances of four machine learning classifiers, namely Random Forest (RF), Extreme Gradient Boosting (XGBoost), Support Vector Machine (SVM), and Artificial Neural Network (ANN), are compared for flood extent mapping using multi-sensor and multi-temporal satellite data. Sentinel-1 SAR backscatter (VV, VH), Sentinel-2 NDVI, and elevation derived from SRTM were fused to create predictive features, whose ground-truth labels were derived from professional digitization of flooded and non-flooded regions that were validated using LASEMA flood reports (2022–2023). Preprocessing included speckle filtering, atmospheric correction, terrain correction, and co-registration to 10 m resolution. Models were trained and tested on spatial block 5-fold cross-validation to avoid spatial autocorrelation, and compared on accuracy, precision, recall, F1-score, and ROC-AUC, whose ROC curves were compared statistically using the DeLong test. Results indicate that ensemble models performed better than conventional classifiers. RF produced the highest recall (0.93) and ROC-AUC (0.972) and was therefore better at identifying flooded pixels, while XGBoost produced the highest precision (0.92), reducing false alarms. The two models performed better than SVM and ANN (accuracies < 0.90) on a consistent basis. Feature importance analysis indicated SAR backscatter as the strongest predictor, although NDVI and elevation were complementary. Spatial susceptibility mapping indicated that almost 50% of Lagos falls within high to very high flood-risk areas, specifically low-lying coasts like Lekki, Victoria Island, and Ajegunle.
This study proves that ensemble learning using combined multi-sensor satellite data offers a scalable and robust platform for detecting floods in intricate urban settings. The results support incorporating ML-based flood mapping into Lagos and other Sub-Saharan African cities' cities' disaster planning and urban planning policies.
Keywords: Flood mapping, machine learning, urban flooding, temporal, satellite data, multi-sensor data fusion, synthetic aperture, urban flooding disaster, risk management