A REVIEW OF STATISTICAL METHODS FOR ADDRESSING MISSING DATA
Keywords:
missing data, imputation, multiple imputationDOI:
https://doi.org/10.17654/0972361725054Abstract
Ensuring consistent and complete data is a fundamental requirement in any research study involving data analysis, as it forms the basis for producing reliable, meaningful, and actionable results. Missing data or censored data poses a big challenge for the analysis which may even lead to unreasonable and misleading results. Omitting such kind of data leads to information loss, which may affect the precision of the result. Hence, using such data requires analytical techniques which should be coined explicitly to combat this problem. Many researchers have been dealing with the topic of missing data since 1960s. This paper reviews key statistical methods that have been developed to address the challenges of missing data.
Received: March 27, 2025
Accepted: May 21, 2025
References
D. B. Rubin, Inference and missing data, Biometrika 63(3) (1976), 581-592. https://doi.org/10.2307/2335739.
W. E. Becker and W. B. Walstad, Data loss from pretest to post test as a sample selection problem, The Review of Economics and Statistics 72(1) (1990), 184-188.
A. A. Afifi and R. M. Elashoff, Missing observations in multivariate statistics. I. Review of the literature, J. Amer. Statist. Assoc. 61(315) (1966), 595-604. https://doi.org/10.1080/01621459.1966.10480891.
O. Harel and X. H. Zhou, Multiple imputation: review of theory, implementation and software, Stat. Med. 26(16) (2007), 3057-3077. doi: 10.1002/sim.2787.
T. Orchard and M. A. Woodbury, A missing information principle: theory and applications, Proceedings of the Sixth Berkeley Symposium on Mathematical Statistics and Probability, Volume 1: Theory of Statistics, University of California Press, Vol. 6, 1972, pp. 697-716.
A. P. Dempster, N. M. Laird and D. B. Rubin, Maximum likelihood from incomplete data via the EM algorithm, Journal of the Royal Statistical Society, Series B 39 (1977), 1-38. https://doi.org/10.1111/j.2517- 6161.1977.tb01600.x.
R. J. A. Little, A class of pattern mixture models for normal incomplete data, Biometrika 81 (1994), 471-483.
J. L. Schafer, Analysis of Incomplete Multivariate Data, Chapman and Hall, New York, 1997.
Q. Song and S. Martin, Missing data imputation techniques, Int. J. Bus. Intell. Data Min. 2 (2007), 261-291. https://doi.org/10.1504/IJBIDM.2007.015485.
S. Van Buuren, Flexible imputation of missing data, Chapter 2, Multiple Imputation, Flexible Imputation of Missing Data, Chapman and Hall/CRC Press, Boca Raton, 2012, pp. 25-52.
J. R. Carpenter and M. G. Kenward, Multiple Imputation and its Application, 1st ed., Chichester, West Sussex, John Wiley & Sons, UK, 2014.
T. Raghunathan, Missing Data Analysis in Practice, 1st ed., Chapman and Hall/CRC, 2015. https://doi.org/10.1201/b19428.
R. J. Little and D. B. Rubin, Statistical Analysis with Missing Data, 2nd ed., Wiley, New York, 2002.
A. Z. Alruhaymi and C. J. Kim, Study on the missing data mechanisms and imputation methods, Open Journal of Statistics 11 (2021), 477-492.
https://doi.org/10.4236/ojs.2021.114030.
W. J. Dixon, BMDP Statistical Software, University of California Press, Los Angeles, 1988.
M. Jamshidian and S. Jalal, Tests of homoscedasticity, normality, and missing completely at random for incomplete multivariate data, Psychometrika 75(4) (2010), 649-674. doi: 10.1007/s11336-010-9175-3.
P. J. Diggle, On parameter estimation and goodness-of-fit testing for spatial point patterns, Biometrics 35 (1979), 87-101. https://doi.org/10.2307/2529938.
D. R. Barr and T. Davidson, A Kolmogorov-Smirnov test for censored samples, Technometrics 15 (1973), 739-757.
https://doi.org/10.1080/00401706.1973.10489108.
R. J. A. Little, A test of missing completely at random for multivariate data with missing values, J. Amer. Statist. Assoc. 83(404) (1988), 1198-1202. https://doi.org/10.1080/01621459.1988.10478722.
Roderick Little and Donald Rubin, Statistical Analysis with Missing Data, 3rd ed., Wiley, 2019. 10.1002/9781119482260.
R. J. A. Little, Regression with missing X’s: a review’, J. Amer. Statist. Assoc. 87 (1992), 1227-1238.
S. F. Buck, A method of estimation of missing values in multivariate data suitable for use with an electronic computer, Journal of the Royal Statistical Society, Series B (Methodological) 22(2) (1960), 302-306.
http://www.jstor.org/stable/2984099.
B. L. Ford, An overview of hot-deck procedures, W. G. Madow, I. Olkin and D. B. Rubin, eds., Incomplete Data in Sample Surveys, New York, Academic Press, Vol. 2, 1983, pp. 185-207.
J. N. K. Rao and J. Shao, Jackknife variance estimation with survey data under hot deck imputation, Biometrika 79(4) (1992), 811-822.
M. C. Reilly, A. S. Zbrozek and E. M. Dukes, The validity and reproducibility of a work productivity and activity impairment instrument, Pharmacoeconomics 4(5) (1993), 353-365. doi: 10.2165/00019053-199304050-00006.
R. R. Andridge and R. J. Little, A review of hot deck imputation for survey non-response, Int. Stat. Rev. 78(1) (2010), 40-64.
doi: 10.1111/j.1751-5823.2010.00103.x.
P. Levy, Missing Data Estimation: “Hot Deck” and “Cold Deck”, Encyclopedia of Biostatistics, New York, John Wiley & Sons, 2005.
D. B. Rubin, Multiple Imputation for Nonresponse in Surveys, Wiley, 1987.
DOI: 10.1002/9780470316696.
D. B. Rubin, Multiple imputation after 18+years, J. Amer. Statist. Assoc. 91 (1996), 473-489.
S. Van Buuren, Flexible Imputation of Missing Data, 2nd ed., CRC Press, Boca Raton, 2018. https://doi.org/10.1201/9780429492259.
R. A. Abassi and A. S. Msengwa, Classification of breast cancer recurrence based on imputed data: a simulation study, BioData Min. 15(1) (2022), 30.
doi: 10.1186/s13040-022-00316-8.
I. R. White, P. Royston and A. M. Wood, Multiple imputation using chained equations: issues and guidance for practice, Stat. Med. 30(4) (2011), 377-399.
H. O. Hartley and R. R. Hocking, The analysis of incomplete data, Biometrics 27 (1971), 783-823. https://doi.org/10.2307/2528820.
S. L. Edwards, M. E. Berzofsky and P. P. Biemer, Addressing nonresponse for categorical data items using full information maximum likelihood with latent GOLD 5.0, Research Triangle Park (NC), RTI Press, 2018.
J. L. Arbuckle, Full information estimation in the presence of incomplete data, G. A. Marcoulides and R. E. Schumacker, eds., Advanced Structural Equation Modeling, N. J. Mahwah, Lawrence Erlbaum Associates, Inc., 1996, pp. 243-277.
W. Wothke, Longitudinal and multi-group modeling with missing data, T. D. Little, K. U. Schnabel and J. Baumert, eds., Modeling Longitudinal and Multiple Group Data: Practical Issues, Applied Approaches and Specific Examples, N. J. Mahwah, Lawrence Erlbaum Associates, Inc., 2000, pp. 219-240.
A. Boomsma, The Robustness of LISREL Against Small Sample Sizes in Factor Analysis Model, Part I, Amsterdam [u.a.]: North-Holland Publ. Co., 1982, pp. 149-173.
F. De Felice, L. Mazzoni and F. Moriconi, An expectation-maximization algorithm for including oncological COVID-19 deaths in survival analysis, Curr. Oncol. 30(2) (2023), 2105-2126. doi: 10.3390/curroncol30020163.
J. Heckman, Sample selection bias as a specification error, Econometrica 47(1) (1979), 153-161. http://dx.doi.org/10.2307/1912352.
Downloads
Published
Issue
Section
License
Copyright (c) 2025 Pushpa Publishing House, Prayagraj, India

This work is licensed under a Creative Commons Attribution 4.0 International License.
____________________________
Attribution: Credit Pushpa Publishing House as the original publisher, including title and author(s) if applicable.
No Derivatives: Modifying or creating derivative works not allowed without written permission.
Contact Pushpa Publishing House for more info or permissions.
Journal Impact Factor: 