Closed Pattern Mining And Causal Analysis Of Pollution Data

IJEP 41(1): 42-49 : Vol. 41 Issue. 1 (January 2021)

S. Sharmiladevi and S. Siva Sathya*

Pondicherry University, Department of Computer Science, Puducherry – 605 014, India


Mining sequential patterns are of great importance in recent years, as it unveils some of the unknown associative relationships between observations. While in mining sequential patterns many intermediate sequences have to be generated, which is a computationally challenging task when compared to frequent patterns of mining. CloFAST is an algorithm which mines closed sequences without candidate maintenance. Also, CloFAST requires only one step to check closure and prune the search space. It can mine long closed sequences effortlessly from large datasets. In this work, a closed sequential pattern mining of PM2.5 pollutant in Delhi is done using CloFAST. Delhi, the capital of the second most populous country on earth has been suffering from severe air pollution problem. Delhi is getting polluted due to diverse reasons, like its geography, burning crop stubble in neighbouring states, vehicular emission, etc. Some of the critical air pollutants found in Delhi are PM10, PM2.5, nitrogen oxide, sulphur oxide, carbon monoxide, ozone. The main pollutant being particulate matter (PM2.5) as it causes serious health problems when it enters into the alveoli of human lungs. Various micro-level analysis of air pollution is being carried out recently. But macro-level analysis is also required in order to obtain a clear understanding on a broader scale. The patterns obtained are given as knowledge for causal analysis done using the FCI algorithm.


Air pollution, PM2.5 pollutant, Closed sequential mining, Particulate matter, Delhi


  1. Guttikunda, S.K. and B.R. Gurjar. 2012. Role of meteorology in seasonality of air pollution in megacity Delhi. Env. Monitor. Assess., 184(5): 3199-3211. DOI:0.1007/s0661-011-2182-8.
  2. Zhao, C. and G. Song. 2017. Application of data mining to the analysis of meteorological data for air quality prediction: A case study in Shenyang. IOP Conference Series: Earth Env. Sci., 81. DOI: 10.1088/1755-1315/81/1/012097.
  3. Nagpure, A.S., B.R. Gurjar and J. Martel. 2014. Human health risks in national capital territory of Delhi due to air pollution. Atmos. Poll. Res., 5(3): 371-380.
  4. Ming, L., et al. 2017. PM2.5in the Yangtze river delta, China: Chemical compositions, seasonal variations and regional pollution events. Env. Poll., 223:200-212. DOI:10.1016/j.envpol.2017.01.013.
  5. Times of India. 2018. Usual suspects: Vehicles, industrial emissions behind foul play all year. Available: https://timesofindia.
  6. Fournier-Viger, P., et al. 2017. A survey of sequential pattern mining. Data Sci. Pattern Recognition. 1(1): 54-77.
  7. Wang, L., et al. 2018. Effective lossless condensed representation and discovery of spatial co-location patterns. Information Sci., 436-437: 197-213.
  8. Fumarola, F., et al. 2016. CloFAST: Closed sequential pattern mining using sparse and vertical id-lists. Knowledge Information Systems. 48(2): 429-463. DOI:10.1007/s 10115-015-0884-x.
  9. Srikant, R. and R. Agrawal. 1996. Mining sequential patterns: Generalizations and performance improvements. In Advances in database technology (vol 1057). Ed P. Apers, M. Bouzeghoub and G. Gardarin. Springer-Verlag Berlin Heidelberg.
  10. Zaki, M.J. 2001. SPADE: An efficient algorithm for mining frequent sequences. Machine Learning. 42(1-2): 30.
  11. Fournier-Viger, P., et al. 2014. Fast vertical mining of sequential patterns using co-occurrence information. In Advances in knowledge discovery and data mining (vol 8443). Springer International Publishing. pp 40-52.
  12. Ayres, J., et al. 2002. Sequential pattern mining using a bitmap representation. KDD ’02: Proceedings of the eighth ACM SIGKDD International Conference on Knowledge discovery and data mining. pp 429-435.
  13. Yang, Z., Y. Wang and M. Kitsuregawa. 2007. LAPIN: Effective sequential pattern mining algorithms by last position induction for dense databases. In Advances in databases: Concepts, systems and applications (vol 4443). Springer-Verlag Berlin Heidelberg. pp 1020-1023.
  14. Yan, X., J. Han and R. Afshar. 2003. CloSpan: mining: Closed sequential patterns in large data sets. Proceedings of the third SIAM International Conference on Data Mining. DOI:10.1137/1.9781611972733.15.
  15. Wang, J. and J. Han. BIDE: Efficient mining of frequent closed sequences. Proceedings of the 20th International Conference on Data engineering. pp 79-90. DOI:10.1109/ICDE. 2004.1319986.
  16. Spirtes, P., C. Glymour and R. Scheines. 2000. Causation, prediction and search (2nd edn). MIT Press.
  17. Jiang, L. and L. Bai. 2018. Spatio-temporal characteristics of urban air pollutions and their causal relationships: Evidence from Beijing and its neighbouring cities. Scientific Reports. 8(1). DOI: 10.1038/s41598-017-18107-1.
  18. Li, X., et al. 2017. Discovering pollution sources and propagation patterns in urban area. Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge discovery and data mining. pp 1863-1872. DOI: 10.1145 /3097983.3098090.
  19. Zhu, J.Y., C. Sun and V.O. K. Li. 2015. Granger-causality-based air quality estimation with spatio-temporal (S-T) heterogeneous big data. Proceedings of the IEEE Conference on Computer communications workshops (INFOCOM WKSHPS), Hong Kong. pp 612-617. DOI:10.1109/INFCOMW.2015. 7179453.
  20. Zhu, J.Y., et al. 2016. A Gaussian bayesian model to identify spatio-temporal causalities for air pollution based on urban big data. Proceedings of the IEEE Conference on Computer communications workshops (INFOCOM WKSHPS), San Francisco, USA. pp 3-8. DOI: 10.1109/INFCOMW.2016.756 2036.
  21. Yuan, Q., et al. 2014. Temporal variations, acidity and transport patterns of PM2.5ionic components at a background site in the Yellow river delta, China. Air Quality Atmos Health. 7(2): 143-153.
  22. Joshi, D., A.S. Sabitha and S. Sharma. 2016. Air pollution data analysis using time series clustering for IOT. Int. J. Cont. Theory Applications. 9(46): 12.
  23. Sathya, D., J. Anu and M. Divyadharshini. 2017. Air pollution analysis using clustering algorithms. Proceedings of the International Conference on Emerging treds in engineering, science and sustainable technology. pp 4.
  24. Zhang, H., Z. Wang and W. Zhang. 2016. Exploring spatiotemporal patterns of PM2.5in China based on ground-level observations for 190 cities. Env. Poll., 216: 559-567. DOI: 10.1016/j.envpol.2016.009.
  25. Zhou, M., et al. 2016. Spatial and temporal patterns of air quality in the three economic zones of China. J. Maps. 12 (sup1): 156-162. DOI:10.1080/1744564 7.2016.1187095.
  26. Dadhich, A.P., R. Goyal and P.N. Dadhich. 2018. Assessment of spatio-temporal variations in air quality of Jaipur city, Rajasthan. The Egyptian J. Remote Sensing Space Sci., 21(2): 173-181. DOI: 10.1016/j.ejrs.2017.04.002.
  27. Lin, G., et. al. 2013. Spatio-temporal variation of PM2.5concentrations and their relationship with geographic and socio-economic factors in China. Int. J. Env. Res. Public Health. 11(1): 173-186.
  28. Sharma, N., et. al. 2018. Forecasting air pollution load in Delhi using data analysis tools. Procedia Computer Sci., 132: 1077-1085. DOI:10.1016/j. procs.2018.05.023.
  29. Yang, G., J. Huang and X. Li. 2018. Mining sequential patterns of PM2.5pollution in three zones in China. J. Cleaner Production. 170:388-398. DOI: 1016/j.jclepro.2017.09.162.
  30. Bellinger, C., et al. 2017. A systematic review of data mining and machine learning for air pollution epidemiology. BMC Public Health. 17(1). DOI:10. 1186/s12889-017-4914-3.