An Empirical Analysis of Machine Learning Models for Developing a Custom Air Quality Forecaster

IJEP 42(11): 1299-1309 : Vol. 42 Issue. 11 (November 2022)

Tanya Garg and Daljeet Singh Bawa*

Bharati Vidyapeeth University, Bharati Vidyapeeth’s Institute of Management and Research, New Delhi – 110 063, India


The growing events of air pollution over the last decade have made implementing preventative measures a need more than a caution. An effective air quality forecasting system is the backbone of all measures. A detailed global literature review identifies statistical and ensemble models to be more efficient for forecasting problems. This study aims at experimentally analyzing 10 machine learning models on predicting PM2.5, PM10, SO2, CO, NO2 and O3 for developing a custom air quality forecaster for Delhi, India. The performance of all models was compared individually for each pollutant prediction using evaluation metrics, such as RMSE, MAE and MedAE. Based on experimental evidence, we conclude that a timeseries based deep neural network model performs best in the given scenario and can be explored further to create a custom air quality forecasting framework.


Air quality forecasting, Atmospheric pollution forecast, Delhi air pollution, Machine learning, Deep learning, LSTM model


  1. Zhang, Y., et al. 2012. Real-time air quality forecasting. Part I: History, techniques and current status. Atmos. Env., 60: 632–655.
  2. Air Pollution. Available at :
  3. The Indian Express. Want Govt. to build 1,600 km green wall along Aravalli, says activist. Cities News. Available at :
  4. Al Jazeera. Pollution to cut 9 years of life expectancy of 40% of Indians. Climate News. Available at :
  5. The Economic Times. Choking India gets air quality index. Available at : https://economictimes.
  6. The Indian Express. Air quality: New WHO norms, now almost entire India polluted. India News. Available at :
  7. Air pollution impact : 1 in 3 school kids In Delhi are asthmatic, over 50% have allergies. Available at :
  8. Graded Action Plan to fight NCR pollution. Available at : html.
  9. Geoffrey, W.C. 2010. An enhanced PM2.5air quality forecast model based on nonlinear regression and back-trajectory concentrations. Atmos. Env., 44(25): 3015–3023.
  10. Chen, Y., et al. 2013. Ensemble and enhanced PM10concentration forecast model based on stepwise regression and wavelet analysis. Atmos. Env., 74: 346–359.
  11. Wang, D., et al. 2017a. A novel hybrid model for air quality index forecasting based on two-phase decomposition technique and modified extreme learning machine. Sci. Total Env., 580: 719–733.
  12. Qin, S., et al. 2014. Analysis and forecasting of the particulate matter (PM) concentration levels over four major cities of China using hybrid models. Atmos. Env., 98: 665–675.
  13. Xi, X., et al. 2021. A Comprehensive evaluation of air pollution prediction improvement by a machine learning method. Available at : https://ieeexplore.
  14. Singh, K. P., S. Gupta and P. Rai. 2013. Identifying pollution sources and predicting urban air quality using ensemble learning methods. Atmos. Env., 80: 426–437.
  15. Wang, Z., et al. 2020. An enhanced interval PM2.5 concentration forecasting model based on BEMD and MLPI with influencing factors. Atmos. Env., 223: 117200.
  16. Díaz-Robles, L. A., et al. 2008a. A hybrid ARIMA and artificial neural networks model to forecast particulate matter in urban areas: The case of Temuco, Chile. Atmos. Env., 42(35): 8331–8340.
  17. Tian, J. and D. Chen. 2010. A semi-empirical model for predicting hourly ground-level fine particulate matter (PM2.5) concentration in southern ontario from satellite remote sensing and ground-based meteorological measurements. Remote Sensing Env., 114(2): 221–229.
  18. Yahya, K., Y. Zhang and J. M. Vukovich. 2014. Real-time air quality forecasting over the southeastern united States using WRF/Chem-Madrid: Multiple-year assessment and sensitivity studies. Atmos. Env., 92: 318–338.
  19. Tsai, Y.T., Y.R. Zeng and Y.S. Chang. 2018. Air pollution forecasting using RNN with LSTM. IEEE 16th International Conference on Dependable, autonomic and secure computing.
  20. Dunea, D., A. Pohoata and S. Iordache. 2015. Using wavelet–feedforward neural networks to improve air pollution forecasting in urban environments. Env. Monit. Assess., 187(7): 1–16.
  21. Oliveri Conti, G., et al. 2017. A review of AirQ models and their applications for forecasting the air pollution health outcomes. Env. Sci. Poll. Res., 24(7): 6426–6445.
  22. Güler Dincer, N. and Ö. Akkus. 2018. A new fuzzy time series model based on robust clustering for forecasting of air pollution. Ecol. Informatics. 43:157–164.
  23. Gocheva-Ilieva, S., et al. 2013. Time series analysis and forecasting for air pollution in small urban area: An SARIMA and factor analysis approach. Stochastic Env. Res. Risk Assess., 28(4): 1045–1060.
  24. Krishan, M., et al. 2019. Air quality modelling using long short-term memory (LSTM) over NCT-Delhi, India. Air Quality Atmos. Health. 12(8): 899–908.
  25. Srivastava, C., S. Singh and A. P. Singh. 2019. Estimation of air pollution in Delhi using machine learning techniques. International Conference on Computing, power and communication technologies. GUCON. 2018:304–309.
  26. Tyralis, H., G. Papacharalampous and A. Langousis. 2020. Super ensemble learning for daily streamflow forecasting : Large-scale demonstration and comparison with multiple machine learning algorithms. Neural Computing applications. 33(8):3053–3068.
  27. Wang, D., et al. 2017b. A novel hybrid model for air quality index forecasting based on two-phase decomposition technique and modified extreme learning machine. Sci. Total Env., 580:719–733.
  28. Kumar, A. and P. Goyal. 2011. Forecasting of air quality in Delhi using principal component regression technique. Atmos. Poll. Res., 2(4): 436–444.
  29. Terry, J. P., et al. 2018. The Delhi ‘gas chamber’: Smog, air pollution and the health emergency of November 2017. Weather. 73(11): 348–352. doi: 10.1002/wea.3242.
  30. Rizwan, S. A., B. Nongkynrih and S. K. Gupta. 2013. Air pollution in Delhi: Its magnitude and effects on health. Indian J. Community Medicine. 38(1): 4–8.
  31. Kampa, M. and E. Castanas. 2008. Human health effects of air pollution. Elsevier .151(2): 362–367.
  32. WHO. Air quality guidelines for Europe (2nd edn). World Health Organization, Geneva. Available at : sdt=0%2C5&q =WHO.+Air+Quality+ Guide-lines+for+Euro pe%2C+2nd+edn+WHO +Reg+Publ+Eur+Ser+ 2000%3B+91%3A+ 1%E2%80%93287.&btnG=.
  33. Ranstam, J. and J.A. Cook. 2018. LASSO regression. British J. Surgery 105(10):1348–1348.
  34. Liu, S. and E. Dobriban. 2019. Ridge regression: Structure, cross-validation and sketching. Cornell University.
  35. Kim, K. and J.S. Hong. 2017. A hybrid decision tree algorithm for mixed numeric and categorical data in regression analysis. Pattern Recognition Letters. 98: 39–45.
  36. Xi, X., et al. 2015. A comprehensive evaluation of air pollution prediction improvement by a machine learning method. 10th IEEE International Conference on Service operations and logistics and informatics, SOLI 2015. Proceedings, pp 176–181.
  37. Song, Y., et al. 2017. An efficient instance selection algorithm for k-nearest neighbour regression. Neurocomputing. 251: 26–34.
  38. Pesantez-Narvaez, J., M. Guillen and M. Alcañiz. 2019. Predicting motor insurance claims using telematics data—XG boost vs logistic regression. Risks. 7(2):70.
  39. Diaz-Robles, L.A., et al. 2008b. A hybrid ARIMA and artificial neural networks model to forecast particulate matter in urban areas: The case of Temuco, Chile. Atmos. Env., 42(35): 8331-8340.F
  40. Feng, R., et al. 2019. Recurrent neural network and random forest for analysis and accurate forecast of atmospheric pollutants: A case study in Hangzhou, China. J. Cleaner Prod., 231: 1005-1015.