Advanced Sequential Learning Models for IoT Intrusion Detection: A Comparative Analysis of Transformer, GRU and LSTM Architectures
Oluwapelunmi Bankole *
Department of Management Information Systems, Lee Business School, University of Nevada Las Vegas, Las Vegas, NV 89154, USA.
*Author to whom correspondence should be addressed.
Abstract
The rapid proliferation of Internet of Things (IoT) devices has introduced unprecedented security challenges, with network intrusion detection systems (NIDS) becoming critical for safeguarding IoT infrastructures. While traditional machine learning approaches have shown promise, the complex and evolving nature of IoT network attacks demands more sophisticated detection mechanisms. This study presents a comprehensive comparative analysis of advanced sequential learning models—Long Short-Term Memory (LSTM), Gated Recurrent Unit (GRU), Transformer, and a novel Hybrid Transformer-GRU architecture—for IoT intrusion detection. Using the UNSW-NB15 dataset comprising 257,673 network flow records, we implemented a robust preprocessing pipeline incorporating Synthetic Minority Over-sampling Technique (SMOTE) for class balancing and Analysis of Variance (ANOVA) for feature selection, reducing the feature space from 47 to 36 dimensions. Our experimental results demonstrate that the Transformer architecture achieved superior performance with 88.09% accuracy, 90.52% F1-score, and 0.9802 ROC-AUC, exhibiting exceptional precision of 98.82% with minimal overfitting of 0.41%. The Hybrid Transformer-GRU model closely followed with 88.00% accuracy, while both LSTM and GRU architectures achieved over 87% accuracy. Notably, the GRU model demonstrated the best computational efficiency with a training time of only 15.9 minutes compared to 208 minutes for the Transformer, making it suitable for resource-constrained IoT environments. All models exhibited excellent generalization capabilities with overfitting rates below 0.5%. These findings advance the state-of-the-art in IoT security by demonstrating that attention-based mechanisms can significantly enhance intrusion detection performance, while also providing practical insights for model selection based on accuracy-efficiency trade-offs in real-world IoT deployments. Hyperparameter optimization was performed through systematic evaluation, and all models were assessed using a comprehensive suite of metrics including accuracy, precision, recall, F1-score, and ROC-AUC, with detailed analysis presented in the experimental results section.
Keywords: Internet of things, intrusion detection system, deep learning, LSTM, GRU, transformer, attention mechanism, network security, UNSW-NB15, cybersecurity