A COMPARATIVE STUDY OF STATISTICAL AND INTELLIGENT CLASSIFICATION MODELS FOR PREDICTING DIABETES

Fatma Y. Alshenawy; Ehab M. Almetwally

doi:10.17654/0972361723046

Authors

Fatma Y. Alshenawy
Ehab M. Almetwally

Keywords:

classification models, logistic regression, regression neural networks.

DOI:

https://doi.org/10.17654/0972361723046

Abstract

Classification tasks play a pivotal role in various domains, including healthcare, finance, and marketing. Accurate classification models can drive decision-making and provide valuable insights. While logistic regression has been a long-standing method for classification, recent advancements in intelligent models have led to the development of more advanced techniques. This study aims to explore and compare five different classification models: logistic regression, robust logistic regression, adaptive splines regression, k-nearest neighbor and regression neural networks. A comprehensive review of the literature is presented using seven performance measures - MSE, MAE, RMSE, COV, CC, EC and accuracy which have been calculated for these five models. Furthermore, this study provides a foundation for future research on developing more efficient and accurate classification models and investigating advanced ensemble techniques to leverage the strengths of different models.

Received: May 2, 2023
Accepted: June 13, 2023

References

D. W. Aha, D. Kibler and M. K. Albert, Instance-based learning algorithms, Machine Learning 6(1) (1991), 37-66.

N. S. Altman, An introduction to kernel and nearest-neighbor nonparametric regression, The American Statistician 46(3) (1992), 175-185.

K. Beyer, J. Goldstein, R. Ramakrishnan and U. Shaft, When is nearest neighbor meaningful? International Conference on Database Theory, Springer, Berlin, Heidelberg, 1999, pp. 217-235.

E. Cantoni and E. Ronchetti, Robust inference for generalized linear models, Journal of the American Statistical Association 96(455) (2001), 1022-1030.

T. Cover and P. Hart, Nearest neighbor pattern classification, IEEE Transactions on Information Theory 13(1) (1967), 21-27.

D. Firth, Bias reduction of maximum likelihood estimates, Biometrika 80(1) (1993), 27-38.

J. H. Friedman, Multivariate adaptive regression splines, The Annals of Statistics 19(1) (1991), 1-67.

I. Goodfellow, Y. Bengio and A. Courville, Deep Learning, MIT Press, 2016.

T. Hastie and R. Tibshirani, Discriminant adaptive nearest neighbor classification and regression, Advances in Neural Information Processing Systems 8 (1996), 409-415.

T. Hastie, R. Tibshirani and J. Friedman, The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Springer Science & Business Media, 2009.

R. Hecht-Nielsen, Theory of the back propagation neural network, Neural Networks 1(Supplement-1) (1989), 445-448.

D. W. Hosmer Jr., S. Lemeshow and R. X. Sturdivant, Applied Logistic Regression, John Wiley & Sons, 2013.

https://doi.org/10.1016/B978-0-12-741252-8.50010-8.

P. Indyk and R. Motwani, Approximate nearest neighbors: towards removing the curse of dimensionality, Proceedings of the Thirtieth Annual ACM Symposium on Theory of Computing 1998, pp. 604-613.

A. K. Jain, R. P. W. Duin and J. Mao, Statistical pattern recognition: a review, IEEE Transactions on Pattern Analysis and Machine Intelligence 22(1) (2000), 4-37.

M. Kelleci and U. Celik, A weighted k-nearest neighbor algorithm for diabetes prediction, International Journal of Intelligent Systems and Applications in Engineering 5(3) (2017), 187-191.

R. Kohavi, A study of cross-validation and bootstrap for accuracy estimation and model selection, Proceedings of the 14th International Joint Conference on Artificial Intelligence, Morgan Kaufmann, Vol. 2, 1995, pp. 1137-1143.

S. Perveen, M. Shahbaz and A. Guergachi, Performance analysis of data mining classification techniques to predict diabetes, Procedia Computer Science 155 (2019), 485-490.

K. Polat and S. Günes, An expert system approach based on principal component analysis and adaptive neuro-fuzzy inference system to diagnosis of diabetes disease, Digital Signal Processing 17(4) (2007), 702-710.

S. Riffat and A. Mayere, Performance evaluation of v-trough solar concentrator for water desalination applications, Applied Thermal Engineering 50(1) (2013), 234-244.

D. E. Rumelhart, G. E. Hinton and R. J. Williams, Learning representations by back-propagating errors, Nature 323(6088) (1986), 533-536.

S. V. Stehman, Selecting and interpreting measures of thematic classification accuracy, Remote Sensing of Environment 62(1) (1997), 77-89.

V. N. Vapnik, The Nature of Statistical Learning Theory, Springer-Verlag, 1995.

W. Yu, Y. Wang and Y. Eun, A comparative study of machine learning algorithms for predicting diabetes using electronic health records, International Journal of Environmental Research and Public Health 17(3) (2020), 926.

Article Stats:

Advances and Applications in Statistics

A COMPARATIVE STUDY OF STATISTICAL AND INTELLIGENT CLASSIFICATION MODELS FOR PREDICTING DIABETES

Authors

Keywords:

DOI:

Abstract

References

Downloads

Published

Issue

Section

License

How to Cite

Similar Articles

Most read articles by the same author(s)

Quick Links

Important Links:

Recently Published:

Pushpa Publishing House

QUICKLINKS

SERVICES

Frequently Asked Questions (FAQ)

Quick Links

Article Stats:

Advances and Applications in Statistics

A COMPARATIVE STUDY OF STATISTICAL AND INTELLIGENT CLASSIFICATION MODELS FOR PREDICTING DIABETES

Authors

Keywords:

DOI:

Abstract

References

Downloads

Published

Issue

Section

License

How to Cite

Similar Articles

Most read articles by the same author(s)

Quick Links

Important Links:

Recently Published: