A Survey on Statistical Methods used in Air Quality Prediction

IJEP 42(5): 549-558 : Vol. 42 Issue. 5 (May 2022)

Tripta Narayan1, Tanushree Bhattacharya2, Soubhik Chakraborty3*, and Swapan Konar1

1. Birla Institute of Technology, Department of Physics, Mesra, Ranchi – 835 215, Jharkhand, India
2. Birla Institute of Technology, Department of Civil and Environmental Engineering, Mesra, Ranchi – 835 215, Jharkhand, India
3. Birla Institute of Technology, Department of Mathematics, Mesra, Ranchi – 835 215, Jharkhand, India


Air quality is a matter of prime concern nowadays. When the air gets contaminated or has exceeded the permissible concentration values of some constituents, it is termed air pollution. It may harm the ecological system as well as the natural conditions for the existence of humans. This situation has motivated scholars to conduct significant research work in this area. In such research, the prediction of air quality has been the focus. Prediction of air pollution provides a basis for taking effective precautionary pollution control measures. This article deals with the statistical techniques for the analysis and prediction of air pollution. For this, databases were searched for the relevant literature published during the decade. Studies were reviewed and the methodologies adopted were analysed by comparing their advantages and disadvantages. Non-linear techniques are better than linear techniques to predict air pollution. Among the technologies developed so far, multivariate linear regression analysis is the most common and widely used technique. Artificial neural networks (ANN), support vector machines (SVM) and hybrid models have shown the calibre for better prediction in future. It has been found that there is further scope to improve the accuracy of prediction. Thus, this area is quite open, unsaturated and promising and therefore, it is hoped that the present review will provide helpful guidelines for the forthcoming researchers in this domain.


Autoregressive integrated moving average, forecasting, kriging, multivariate linear regression analysis, air pollution


  1. Baklanov, A., et al. 2007. Integrated systems for forecasting urban meteorology, air pollution and population exposure. Atmos. Chem. Phys., 7:855-874.
  2. Brunelli, U., et al. 2007. Two-day ahead predication of daily maximum concentration of SO2, O3, PM10, NO2, CO in the urban area of Palermo, Italy. Atmos. Env., 41:2967-2995.
  3. Stadlober, E., S. Hormann and B. Pfeiler. 2008. Quality and performance of a PM10daily forecasting model. Atmos. Env., 42:1098-1109.
  4. Paschalidou, A.K., et al. 2011. Forecasting hourly PM10concentration in Cyprus through artificial neural networks and multiple regression models : Implications to local environmental management. Env. Sci. Poll. Res., 18:316-327.
  5. Diaz-Robles, L.A., et al. 2008. A hybrid ARIMA and artificial neural networks model to forecast particulate matter in urban areas : The case of Temuco, Chile. Atmos. Env., 42:8331-8340.
  6. Miller, R.W., R.J. Hauer and L.P. Werner. 2015. Urban forestry planning and managing urban green spaces. Waveland Press.
  7. Labaree, R.V. 2009. Research guides : Organizing four social sciences research paper : Qualitative method.
  8. Hooyberghs, J., et al. 2005. A neural network forecast for daily average PM10concentrations in Belgium. Atmos. Env., 39:3279-3289.
  9. Goyal, P., A.T. Chan and N. Jaiswal. 2006. Statistical methods for the prediction of respirable suspended particulate matter in urban cities. Atmos. Env., 40:2068-2077.
  10. Dubey, B., A.K. Pal and G. Singh. 2012. Trace metal composition of airborne particulate matter in the coal mining and non-mining areas of Dhanbad region, Jharkhand, India. Atmos. Poll. Res., 3(2):238-246.
  11. Dobrot, M., et al. 2013. Measuring and evaluating air pollution per inhabitant : A statistical approach. APCBEE Procedia. 5:33-37.
  12. Jagadish, H.V., et al. 2005. I-distance : An adaptive B+-tree based indexing method for nearest neighbour search. ACM Transactions Database Systems. 30(2):364-397.
  13. Mapoma, H.W., et al. 2014. An air quality assessment of carbon monoxide, nitrogen dioxide and sulphur dioxide levels in Blantyre, Malawi : A statistical approach to a stationery environmental monitoring station. African J. Env. Sci. Tech., 8(6):330-343.
  14. Pandey, B., M. Agrawal and S. Singh. 2014. Assessment of air pollution around coal mining area emphasizing on spatial distributions, seasonal variations and heavy metals, using cluster and principal component analysis. Atmos. Poll. Res., 5(1):79-86.
  15. Corani, G. 2005. Air quality prediction in Milan : Feed-forward neural networks, pruned neural networks and lazy learning. Ecol. model., 185:513-529.
  16. Nagendra, S.S. and M. Khare. 2006. Artificial neural network approach for modelling nitrogen dioxide dispersion from vehicular exhaust emissions. Ecol. Model., 190:99-115.
  17. Perez, P. and J. Reyes. 2006. An integrated neural network model for PM10forecasting. Atmos. Env., 40:2845-2851.
  18. Grivas, G. and A. Chaloulakou. 2006. Artificial neural network models for prediction of PM10hourly concentrations, in the greater area of Athens, Greece. Atmos. Env., 40:1216-1229.
  19. Lu, W.Z., et al. 2004. Potential assessment of a neural network model with PCA/RBF approach for forecasting pollutant trends Mongkok urban air, Hong Kong. Env. Res., 96:79-87.
  20. Elith, I., J.R. Leathwick and T. Hastie. 2008. A working guide to boosted regression trees. J. Animal Ecol., 77(4):802-813.
  21. Bel, L., et al. 2009. CART algorithm for spatial data : Application to environmental and ecological data. Comput. Statistics Data Analysis. 53(8): 3082-3093.
  22. Hastie, T.J. 2017. Generalized additive models. In Statistical models. S. Routledge. pp 249-307.
  23. Shmilovici, A. 2009. Support vector machines. In Data mining and knowledge discovery handbook. Springer, Boston, MA. pp 231-247.
  24. Bruzzone, L. and F. Melgani. 2005. Robust multiple estimator systems for the analysis of biophysical parameters from remotely sensed data. IEEE Trans. Geosci. Remote Sens., 43:159-174.
  25. Sanchez, A.S., et al. 2011. Application of and SVM-based regression model to the air quality study at local scale in the Aviles urban area (Spain). Math. Comput. Model. 54:1453-1466.
  26. Sotomayer-Olmedo, A., et al. 2013. Forecast urban air pollution in Mexico city by using support vector machines : A kernel performance approach. Int. J. Intel. Sci., 3:126-135.
  27. Sayegh, A.S., S. Munir and T.M. Habeebullah. 2014. Comparing the performance of statistical models for predicating PM10concentrations. Aerosol Air Qual. Res., 14:653-665.
  28. Slini, T., et al. 2006. PM10forecasting by Thessaloniki, Greece. Env. Model. Softw., 21:559-565.
  29. Yetilmezsoy, K. and S.A. Abdul-Wahab. 2012. A prognostic approach based on fuzzy-logic methodology to forecast PM10levels in Khaldiya residential area, Kuwait. Aerosol Air Qual. Res., 12:1217-1236.
  30. Kanakiya, R.S., S.K. Singh and U. Shah. 2015. GIS application for spatial and temporal analysis of the air pollutants in urban area. Int. J. Adv. Remote Sens. GIS. 4:1120-1129.
  31. Lertxundi-Manterola, A. and M. Saez. 2009. Modelling of nitrogen dioxide (NO2) and fine particulate matter (PM10) air pollution in the metropolitan areas of Barcelona and Bilbao, Spain. Envirometrics. 20:477-493.
  32. Beelen, R., et al. 2009. Mapping of background air pollution at a fine spatial scale across the European Union. Sci. Total Env., 407:1852-1867.
  33. Pope, R. and J. Wu. 2014. Characterizing air pollution patterns on multiple time scales in urban areas : A land scape ecological approach. Urban Ecosyst., 17:855-874.
  34. Kottur, S.V. and S.S. Mantha. 2015. An integrated model using artificial neural network (ANN) and Kriging for forecasting air pollutants using meteorological data. Int. J. Adv. Res. Comput. Commun. Eng., 4:146-152.
  35. Liao, D., et al. 2006. GIS approaches for the estimation of residential-level ambient PM concentrations. Env. Health Persp., 114:1374-1380.
  36. Aguilera, I., et al. 2007. Using landuse regression modelling to estimate exposure to VOCs in a cohort of pregnant women. Epidemiol., 18:542-543.
  37. Briggs, D. 2005. The role of GIS : Coping with space (and time) in air pollution exposure assessment. J. Toxicol. Env. Health Part A. 68:1243-1261.
  38. Carnevale, C., E. Decanini and M. Volta. 2008. Design and validation of a multiphase 3D model to simulate tropospheric pollution. Sci. Total Env., 390:166-176.
  39. Carnevale, C., et al. 2011. An integrated air quality forecast system for a metropolitan area. J. Env. Monit., 13:3437-3447.
  40. Singh, V., et al. 2011. A co-kriging-based approach to reconstruct air pollution maps, processing measurement station concentrations and deterministic model simulations. Env. Model. Softw., 26:778-786.
  41. Hoek, G., et al. A review of landuse regression models to assess spatial variation of outdoor air pollution. Atmos. Env., 42:7561-7578.
  42. Jerrett, M., et al. 2005. A review and evaluation of intraurban air pollution exposure models. J. Expo. Sci. Env. Epidemiol., 35:185-204.
  43. Ryan, P.H. and G.K. Lemasters. 2007. A review of landuse regression models for characterizing intraurban air pollution exposure. Inhal. Toxicol., 19 (Suppl.1):127-133.
  44. Gilbert, N.L., et al. 2005. Assessing spatial variability of ambient nitrogen dioxide in montreal, Canada with a landuse regression model. J. Air Waste Manage. Assoc., 55:1059-1063.
  45. Ross, Z., et al. 2005. Nitrogen dioxide prediction in Southern California using landuse regression modelling : Potential for environmental health analyses. J. Exp. Sci. Env. Epidemiol., 16:106-114.
  46. Hochadel, M., et al. 2006. Predicting long-term average concentrations of traffic-related air pollutants using GIS-based information. Atmos. Env., 40:542-553.
  47. Alam, M.S. and A. McNabola. 2015. Exploring the modelling of spatio-temporal variations in ambient air pollution within the landuse regression framework : Estimation of PM10concentration on a daily basis. J. Air Waste Manage. Assoc., 65:628-640.
  48. Henderson, S.B., et al. 2007. Application of landuse regression to estimate long-term concentration of traffic-related nitrogen oxides and fine particulate matter. Env. Sci. Tech., 41:2422-2428.
  49. Chen, L., et al. 2010. A landuse regression for predicting NO2 and PM10concentrations in Tianjin region, China. J. Env. Sci., 22:1364-1373.
  50. Dons, E., et al. 2013. Modelling temporal and spacial variability of traffic-related air pollution : Hourly landuse regression models for black carbon. Atmos. Env., 4:237-246.
  51. Dons, E., et al. 2014. Landuse regression models as a tool for short, medium and long-term exposure to traffic-related air pollution. Sci. Total Env., 476-477:378-386.
  52. Liu, W., et al. 2015. Landuse regression models coupled with meteorology to model spatial and temporal variability of NO2and PM10in Changsha, China. Atmos. Env., 116:272-280.
  53. Eeftens, M., et al. 2012. Development of landuse regression models for PM2.5, PM2.5absorbance, PM10and PM coarse in 20 European study areas; results of the ESCAPE project. Env. Sci. Tech., 46:11195-11205.
  54. Amini, H., et al. 2014. Landuse regression models to estimate the annual and seasonal spatial variability of sulphur dioxide and particulate matter in Tehran, Iran. Sci. Total Env., 488-489:343-353.
  55. Basagana, X., et al. 2012. Effect of the number of measurement sites on landuse regression models in estimating local air pollution. Atmos. Env., 54:634-642.
  56. Shahraiyni, H.T., et al. 2015. The influence of the plants on the decrease of air pollutants (case study : Particulate matter in Berlin). Euro-American Conference for academic disciplines. Paris, France.
  57. Shahraiyni, H. T., et al. 2015. A new structure identification scheme for ANFIS and its application for the simulation of virtual air pollution monitoring-stations in urban area. Eng. Appl. Artif. Intel., 41:175-182.
  58. Li, X., et al. 2015. The application of semicircular-buffer-based landuse regression models incorporating wind direction in predicting quarterly NO2and PM10concentration. Atmos. Env., 103:18-24.
  59. Gryparis, A., et al. 2007. Semi parametric latent variable regression models for spatiotemporal modelling of mobile source particles in the greater Boston area. J. R. Stat. Soc. Ser. C Appl. Stat., 56:183-209.
  60. Maynard, D., et al. 2007. Mortality risk associated with short-term exposure to traffic particles and sulphates. Env. Health Persp., 115:751-755.
  61. Su, J.G., et al. 2008. An innovative landuse regression model incorporating meteorology for exposure analysis. Sci. Total Env., 390:520-529.
  62. Shahraiyni, H.T., et al. 2015. The development of a dense urban air pollution monitoring network. Atmos. Poll. Res., 6:904-915.
  63. Narayan, T., et al. 2018. Long-term statistical characteristics of air pollutants in a traffic-congested area of Ranchi, India. Commun. Math. Stat. DOI:10.1007/s40304-018-0129-x.