A SEQUENTIAL METHOD TO SEARCH FOR MULTIPLE OUTLIERS IN MULTIVARIATE DATA
Keywords:
multivariate outliers, robust Mahalanobis distances, contamination, sequential testingDOI:
https://doi.org/10.17654/0972361725061Abstract
In usual multivariate analysis methods such as principal components, discriminant analysis and so on, the sample mean vector and covariance matrix are utilized. These can be strongly affected by the presence of only a few outliers. The problem of detecting outliers in multivariate data sets can be difficult because classical methods based on Mahalanobis distances may work well for identifying scattered outliers but perform poorly in the case of multiple clustered outliers. Methods based on robust Mahalanobis distances also do not perform well when the fraction of contamination or the dimension of data set is high and can also be computationally expensive. A method of detecting multiple outliers in multivariate data is proposed, which involves sequential testing of outliers and utilizes the leave-one-out approach at many stages. The proposed method is applied to several well-known data sets, and it is shown to perform better than many known methods of outlier detection, regardless of degree of contamination or dimension of the data set. It is also shown that it is marginally better to first obtain a clean sample to estimate the mean vector and covariance matrix and then apply classically efficient methods rather than using inefficient robust rules for estimation and subsequent outlier detection.
Received: May 29, 2025
Accepted: July 29, 2025
References
[1] E. Acuna and C. Rodriguez, A meta analysis study of outlier detection methods in classification, Technical paper, Department of Mathematics, University of Puerto Rico at Mayaguez 1(25) (2004), 14.
Retrieved from https://academic.uprm.edu/eacuna/paperout.pdf.
[2] F. J. Anscombe, Rejection of outliers, Technometrics 2(2) (1960), 123-146. https://doi.org/10.2307/1266540.
[3] V. Barnett and T. Lewis, Outliers in Statistical Data, Vol. 3, No. 1, Wiley, New York, 1994.
[4] I. Ben-Gal, Outlier detection, Data Mining and Knowledge Discovery Handbook, 2005, pp. 131-146. Retrieved from http://mse415.free.fr/outliermethods.pdf.
[5] D. E. Booth, P. Alam, S. N. Ahkam and B. Osyk, A robust multivariate procedure for the identification of problem savings and loan institutions, Decision Sciences 20(2) (1989), 320-333.
[6] K. A. Brownlee, Statistical Theory and Methodology in Science and Engineering, A Wiley Publication in Applied Statistics, 1965.
[7] R. J. Carroll and D. Ruppert, Transformations in regression: A robust analysis, Technometrics 27(1) (1985), 1-12. https://doi.org/10.2307/1270463.
[8] C. Croux and G. Haesbroeck, Influence function and efficiency of the minimum covariance determinant scatter matrix estimator, Journal of Multivariate Analysis 71(2) (1999), 161-190. https://doi.org/10.1006/jmva.1999.1839.
[9] P. L. Davies, Asymptotic behaviour of S-estimates of multivariate location parameters and dispersion matrices, The Annals of Statistics (1987), 1269-1292. https://www.jstor.org/stable/2241828.
[10] D. L. Donoho, Breakdown properties of multivariate location estimators, Unpublished qualifying paper, Harvard University, Statistics Dept., 1982.
[11] I. Ehrlich, Participation in illegitimate activities: a theoretical and empirical investigation, Journal of Political Economy 81 (1973), 521-565.
https://www.jstor.org/stable/1831025.
[12] P. Filzmoser, R. Maronna and M. Werner, Outlier identification in high dimensions, Computational Statistics and Data Analysis 52(3) (2008), 1694-1711. https://doi.org/10.1016/j.csda.2007.05.018.
[13] R. Gnanadesikan and J. R. Kettenring, Robust estimates, residuals, and outlier detection with multiresponse data, Biometrics (1972), 81-124.
https://doi.org/10.2307/2528963.
[14] F. E. Grubbs, Procedures for detecting outlying observations in samples, Technometrics 11(1) (1969), 1-21. https://doi.org/10.2307/1266761.
[15] D. M. Hawkins, Identification of Outliers, Springer, Netherlands, 1980.
[16] J. Hardin and D. M. Rocke, Outlier detection in the multiple cluster setting using the minimum covariance determinant estimator, Computational Statistics and Data Analysis 44(4) (2004), 625-638. https://doi.org/10.1016/S0167-9473(02)00280-3.
[17] P. J. Huber, Projection pursuit, The Annals of Statistics (1985), 435-475. https://www.jstor.org/stable/2241175.
[18] M. Hubert, M. Debruyne and P. J. Rousseeuw, Minimum covariance determinant and extensions, Wiley Interdisciplinary Reviews: Computational Statistics 10(3) (2018), e1421. https://doi.org/10.1002/wics.1421.
[19] M. Hubert, P. J. Rousseeuw and K. Vanden Branden, ROBPCA: A new approach to Robust principal component analysis, Technometrics 47(1) (2005), 64-79. https://www.jstor.org/stable/25470935.
[20] R. Lakshmi and T. A. Sajesh, A robust distance-based approach for detecting multidimensional outliers, Journal of Applied Statistics 52(6) (2025), 1278-1298. https://doi.org/10.1080/02664763.2024.2422403.
[21] H. P. Lopuhaä and P. J. Rousseeuw, Breakdown points of affine equivariant estimators of multivariate location and covariance matrices, The Annals of Statistics (1991), 229-248. https://www.jstor.org/stable/2241852.
[22] J. Majewska, Identification of multivariate outliers-problems and challenges of visualization methods, Studia Ekonomiczne (247) (2015), 69-83.
[23] R. A. Maronna, R. D. Martin, V. J. Yohai and M. Salibián-Barrera, Robust Statistics: Theory and Methods (with R), John Wiley and Sons, 2019.
[24] R. A. Maronna and V. J. Yohai, The behavior of the Stahel-Donoho robust multivariate estimator, Journal of the American Statistical Association 90(429) (1995), 330-341. https://doi.org/10.2307/2291158.
[25] T. Ortner, P. Filzmoser, M. Rohm, S. Brodinova and C. Breiteneder, Local projections for high-dimensional outlier detection, Metron 79 (2021), 189-206. https://link.springer.com/article/10.1007/s40300-020-00183-5.
[26] J. X. Pan, W. K. Fung and K. T. Fang, Multiple outlier detection in multivariate data using projection pursuit techniques, Journal of Statistical Planning and Inference 83(1) (2000), 153-167.
https://doi.org/10.1016/S0378-3758(99)00091-9.
[27] K. I. Penny and I. T. Jolliffe, A comparison of multivariate outlier detection methods for clinical laboratory safety data, Journal of the Royal Statistical Society: Series D (The Statistician) 50(3) (2001), 295-307.
https://www.jstor.org/stable/2680933.
[28] D. M. Rocke and D. L. Woodruff, Identification of outliers in multivariate data, Journal of the American Statistical Association 91(435) (1996), 1047-1061. https://doi.org/10.2307/2291724.
[29] P. J. Rousseeuw, Least median of squares regression, Journal of the American Statistical Association 79(388) (1984), 871-880. https://doi.org/10.2307/2288718.
[30] P. J. Rousseeuw, Multivariate estimation with high breakdown point, Mathematical Statistics and Applications 8(283-297) (1985), 37. Retrieved from:
https://wis.kuleuven.be/stat/robust/papers/publications-1985/rousseeuw multivariateestimationhighbreakdown-1985.pdf.
[31] P. J. Rousseeuw and C. Croux, Alternatives to the median absolute deviation, Journal of the American Statistical Association 88(424) (1993), 1273-1283. https://doi.org/10.2307/2291267.
[32] P. J. Rousseeuw and K. V. Driessen, A fast algorithm for the minimum covariance determinant estimator, Technometrics 41(3) (1999), 212-223.
https://doi.org/10.2307/1270566.
[33] P. J. Rousseeuw and A. M. Leroy, Robust Regression and Outlier Detection, John Wiley and Sons, 2003.
[34] P. J. Rousseeuw and B. C. Van Zomeren, Unmasking multivariate outliers and leverage points, Journal of the American Statistical Association 85(411) (1990), 633-639. https://doi.org/10.2307/2289995.
[35] W. A. Stahel, Robuste schätzungen: infinitesimale optimalität und schätzungen von kovarianzmatrizen, Doctoral dissertation, ETH Zurich, 1981.
[36] W. Vandaele, Participation in illegitimate activities: Ehrlich revisited, Deterrence and Incapacitation, A. Blumstein, J. Cohen and D. Nagin, eds., National Academy of Sciences, Washington DC, 1978, pp. 270-335. Retrieved from: https://www.ojp.gov/pdffiles1/Digitization/107762NCJRS.pdf.
Downloads
Published
Issue
Section
License
Copyright (c) 2025 Pushpa Publishing House, Prayagraj, India

This work is licensed under a Creative Commons Attribution 4.0 International License.
____________________________
Attribution: Credit Pushpa Publishing House as the original publisher, including title and author(s) if applicable.
No Derivatives: Modifying or creating derivative works not allowed without written permission.
Contact Pushpa Publishing House for more info or permissions.
Journal Impact Factor: 