Comparative Analysis of Machine Learning Models for Prediction of Surface Water Quality

IJEP 44(4): 360-368 : Vol. 44 Issue. 4 (April 2024)

Shubham Shivhare and Atul Sharma*

Jabalpur Engineering College, Department of Civil Engineering, Jabalpur – 482 011, Madhya Pradesh, India


Accurate prediction of surface water quality is most important task in water resource management and pollution control. Pollution is considered the byproduct of development and urbanization, especially in developing countries, where waste disposal into waterbodies is predominant and also not so advanced. The traditional water quality index calculation method brings difficulty in managing a large amount of data and laborious and time-consuming. We used decision tree (DT) classifier, support vector machine (SVM), random forest (RF) classifier algorithms in machine learning for the real-time prediction of water quality of river in this research work, as machine learning models are superior and also handle large dataset effectively and efficiently. Water quality index (WQI) was used as an indicator to define water quality classification. We analyzed five-year monthly data, from 24/04/16 to 02/03/21, of river Narmada at Jamtara sampling station Jabalpur, Madhya Pradesh, India, taking ten important water quality parameters as input. This study aims to find an accurate algorithm and comparison of accuracy of different algorithms used. Results on testing data showed that random forest outperformed with an accuracy of 86.67% compared to support vector machine (75%) and decision tree (66.67%) for the given dataset.


Decision tree classifier, Machine learning, Random forest, Support vector machine, Water quality index


  1. Shivhare, S. and A. Sharma. 2023. Review paper on prediction of water quality parameter using machine learning. int. j. sci. res. eng. manage., 7(3). DOI: 10.55041/IJSREM18299.
  2. Tiyasha, T.M. Tung and Z.M. Yaseen. 2020. A survey on river water quality modelling using artificial intelligence models: 2000–2020. J. Hydrol., 585: 124670. DOI: 10.1016/j.jhydrol.2020.124670.
  3. Chen, K., et al. 2020. Comparative analysis of surface water quality prediction performance and identification of key water parameters using different machine learning models based on big data. Water Res., 171. DOI: 10.1016/j.watres.2019. 115454.
  4. Ahmed, A.N., et al. 2019. Machine learning methods for better water quality prediction. J. Hydrol., 578. DOI: 10.1016/j.jhydrol.2019.124084.
  5. Wang, X., F. Zhang and J. Ding. 2017. Evaluation of water quality based on a machine learning algorithm and water quality index for the Ebinur lake watershed, China. Sci. Reports. 7(1). DOI: 10. 1038/s41598-017-12853-y.
  6. Lu, H. and X. Ma. 2020. Hybrid decision tree-based machine learning models for short-term water quality prediction. Chemosphere. 249: 126169. DOI: 10.1016/j.chemosphere.2020.126169.
  7. Bui, D. T., et al. 2020. Improving prediction of water quality indices using novel hybrid machine-learning algorithms. Sci. Total Env., 721. DOI: 10.1016/j.scitotenv.2020.137612.
  8. Khan, Y. and C.S. See. 2016. Predicting and analyzing water quality using machine learning: A comprehensive model. DOI: 10.1109/LISAT.2016. 7494106.
  9. GOI. 2014. Narmada basin. Central Water Commision, Ministry of Water Resources, Government of India and National Remote Sensing Centre, ISRO. Available at: pdf.
  10. Govt. of Gujarat. 2023. The Narmada river and basin. Sardar Sarovar Narmada Nigam Ltd., Government of Gujarat. Available at: https://sardarsa
  11. Banerjee, R. 2022. Review of water governance in the Narmada river basin. India Evnironment Portal.
  12. MP-PCB. 2021. River Narmada- water quality report. Madhya Pradesh Pollution Control Board. Available at: gov.inPdfView.a spx?h=River%20Nar mada-Water% 20Quality% 20Report%20201 8%20-%201 9&pdf=/proc/Narmada%202018%2 0-19.pdf.
  13. Horton and K. Robert. 1965. An index number system for rating water quality. J. Water Poll. Cont. Fed., 37(3): 300–306.
  14. Brown, R.M., et al. 1970. A water quality index- Do we dare? Water Sewage Works. 117(10): 339- 343.
  15. Kadam, A.K., et al. 2019. Prediction of water quality index using artificial neural network and multiple linear regression modelling approach in Shivganga river basin, India. Model. Earth Syst. Env., 5(3): 951–962. DOI: 10.1007/s40808-019-00581-3.
  16. Senthilnathan, S. 2019. Usefulness of correlation analysis. SSRN E. J., DOI: 10.2139/ssrn.3416918.
  17. Gehlenborg, N. and B. Wong. 2012. Points of view: Heat maps. Nature Methods. 9(3): 213. DOI: 10.10 38/nmeth.1902.
  18. Pearson, K. 1899. Mathematical theory of evolution. Philosophical Trans. Royal Soc. A Math. Phys. Eng. Sci., 185. DOI: 10.1098/rsta.1894.0003.
  19. Markoulidakis, I., et al. 2021. Multiclass confusion matrix reduction method and its application on net promoter score classification problem. Tech., 9(4): 81. DOI: 10.3390/technologies9040081.
  20. Stehman, S.V. 1997. Selecting and interpreting measures of thematic classification accuracy. Remote Sens. Env., 62(1): 77-89.
  21. Ippolito, P.P. 2022. Hyperparameter tuning: The art of fine-tuning machine and deep learning models to improve metric results. In Applied data science in tourism. pp 231–251. DOI: 10.1007/978-3-030-88389-8_12.
  22. Song, Y.Y. and Y. Lu. 2015. Decision tree methods: Applications for classification and prediction. Shanghai Arch. Psychiatry. 27(2): 130–135. DOI: 10.11919/j.issn.1002-0829.215044.
  23. Hastie, T., R. Tibshirani and J. Friedman. 2001. the elements of statistical learning: Data mining, inference and prediction (2nd edn). Springer series in statistics. DOI: 10.1007/978-0-387-84858-7.
  24. Cortes, C., V. Vapnik and L. Saitta. 1995. Support-vector networks. Machine learning. 20: 273–297. DOI: 10.1007/bf00994018.
  25. Breiman, L. 2001. Random forests. Machine learning. 45: 5–32. DOI: 10.1023/a:1010933404324.
  26. Breiman, L. 1996. Bagging predictors. Machine learning. 24: 123-140. DOI: 10.1007/BF00058 655.