A COMPARATIVE STUDY OF STATISTICAL AND INTELLIGENT CLASSIFICATION MODELS FOR PREDICTING DIABETES
Keywords:
classification models, logistic regression, regression neural networks.DOI:
https://doi.org/10.17654/0972361723046Abstract
Classification tasks play a pivotal role in various domains, including healthcare, finance, and marketing. Accurate classification models can drive decision-making and provide valuable insights. While logistic regression has been a long-standing method for classification, recent advancements in intelligent models have led to the development of more advanced techniques. This study aims to explore and compare five different classification models: logistic regression, robust logistic regression, adaptive splines regression, k-nearest neighbor and regression neural networks. A comprehensive review of the literature is presented using seven performance measures - MSE, MAE, RMSE, COV, CC, EC and accuracy which have been calculated for these five models. Furthermore, this study provides a foundation for future research on developing more efficient and accurate classification models and investigating advanced ensemble techniques to leverage the strengths of different models.
Received: May 2, 2023
Accepted: June 13, 2023
References
D. W. Aha, D. Kibler and M. K. Albert, Instance-based learning algorithms, Machine Learning 6(1) (1991), 37-66.
N. S. Altman, An introduction to kernel and nearest-neighbor nonparametric regression, The American Statistician 46(3) (1992), 175-185.
K. Beyer, J. Goldstein, R. Ramakrishnan and U. Shaft, When is nearest neighbor meaningful? International Conference on Database Theory, Springer, Berlin, Heidelberg, 1999, pp. 217-235.
E. Cantoni and E. Ronchetti, Robust inference for generalized linear models, Journal of the American Statistical Association 96(455) (2001), 1022-1030.
T. Cover and P. Hart, Nearest neighbor pattern classification, IEEE Transactions on Information Theory 13(1) (1967), 21-27.
D. Firth, Bias reduction of maximum likelihood estimates, Biometrika 80(1) (1993), 27-38.
J. H. Friedman, Multivariate adaptive regression splines, The Annals of Statistics 19(1) (1991), 1-67.
I. Goodfellow, Y. Bengio and A. Courville, Deep Learning, MIT Press, 2016.
T. Hastie and R. Tibshirani, Discriminant adaptive nearest neighbor classification and regression, Advances in Neural Information Processing Systems 8 (1996), 409-415.
T. Hastie, R. Tibshirani and J. Friedman, The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Springer Science & Business Media, 2009.
R. Hecht-Nielsen, Theory of the back propagation neural network, Neural Networks 1(Supplement-1) (1989), 445-448.
D. W. Hosmer Jr., S. Lemeshow and R. X. Sturdivant, Applied Logistic Regression, John Wiley & Sons, 2013.
https://doi.org/10.1016/B978-0-12-741252-8.50010-8.
P. Indyk and R. Motwani, Approximate nearest neighbors: towards removing the curse of dimensionality, Proceedings of the Thirtieth Annual ACM Symposium on Theory of Computing 1998, pp. 604-613.
A. K. Jain, R. P. W. Duin and J. Mao, Statistical pattern recognition: a review, IEEE Transactions on Pattern Analysis and Machine Intelligence 22(1) (2000), 4-37.
M. Kelleci and U. Celik, A weighted k-nearest neighbor algorithm for diabetes prediction, International Journal of Intelligent Systems and Applications in Engineering 5(3) (2017), 187-191.
R. Kohavi, A study of cross-validation and bootstrap for accuracy estimation and model selection, Proceedings of the 14th International Joint Conference on Artificial Intelligence, Morgan Kaufmann, Vol. 2, 1995, pp. 1137-1143.
S. Perveen, M. Shahbaz and A. Guergachi, Performance analysis of data mining classification techniques to predict diabetes, Procedia Computer Science 155 (2019), 485-490.
K. Polat and S. Günes, An expert system approach based on principal component analysis and adaptive neuro-fuzzy inference system to diagnosis of diabetes disease, Digital Signal Processing 17(4) (2007), 702-710.
S. Riffat and A. Mayere, Performance evaluation of v-trough solar concentrator for water desalination applications, Applied Thermal Engineering 50(1) (2013), 234-244.
D. E. Rumelhart, G. E. Hinton and R. J. Williams, Learning representations by back-propagating errors, Nature 323(6088) (1986), 533-536.
S. V. Stehman, Selecting and interpreting measures of thematic classification accuracy, Remote Sensing of Environment 62(1) (1997), 77-89.
V. N. Vapnik, The Nature of Statistical Learning Theory, Springer-Verlag, 1995.
W. Yu, Y. Wang and Y. Eun, A comparative study of machine learning algorithms for predicting diabetes using electronic health records, International Journal of Environmental Research and Public Health 17(3) (2020), 926.
Downloads
Published
Issue
Section
License
Copyright (c) 2023 Pushpa Publishing House, Prayagraj, India

This work is licensed under a Creative Commons Attribution 4.0 International License.
____________________________
Attribution: Credit Pushpa Publishing House as the original publisher, including title and author(s) if applicable.
No Derivatives: Modifying or creating derivative works not allowed without written permission.
Contact Pushpa Publishing House for more info or permissions.
Journal Impact Factor: 