IMPROVING THE DIMENSIONALITY REDUCTION OF PCA USING BIVARIATE COPULAS
Keywords:
bivariate copula, Kendall’s tau, dimensionality reduction, correlation, feature extraction, PCADOI:
https://doi.org/10.17654/0972361723015Abstract
Day by day, the size of datasets used in modern solutions is growing up. Thus it is crucial to identify their properties and reduce their dimensions without losing any important information. Many methods have been proposed to deal with this problem, which are divided into two types: feature extraction and feature selection. Among the feature extraction techniques, Principal Component Analysis (PCA) is the most used method. In this study, we develop a new approach, with the intention of improving PCA reduction and information extraction. The proposed method uses bivariate copulas to detect correlation and eliminates it, then it performs PCA on the reduced data. The proposed method is compared against the baseline method PCA and another method that combines multivariate copulas and PCA. The comparison is made using real world data according to the dimensionality reduction, and the classification accuracy of the new reduced data.
Received: November 23, 2022; Accepted: February 7, 2023; Published: February 14, 2023
References
A. Abid, M. J. Zhang, V. K. Bagaria and J. Zou, Exploring patterns enriched in a dataset with contrastive principal component analysis, Nature Communications 9(1) (2018), 1-7.
A. Liaw and M. Wiener, Classification and regression by random Forest, JR News 2(3) (2002), 18-22.
A. Y. Keith and A. G. Paul, The Guttman-Kaiser criterion as a predictor of the number of common factors, Journal of the Royal Statistical Society, Series D (The Statistician) 31(3) (1982), 221-229.
B. Egger, D. Kaufmann, S. Schönborn, V. Roth and T. Vetter, Copula eigenfaces, Proc. International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications, (GRAPP), 2016, pp. 50-58.
C. Genest, B. Rémillard and D. Beaudoin, Goodness-of-fit tests for copulas: a review and a power study, Insurance Math. Econom. 44(2) (2009), 199-213.
D. Dua and C. Graff, UCI, Machine Learning Repository, 2017.
http://archive.ics.uci.edu/ml.
F. Badakhshan Farahabadi, K. Vajaragh and R. Farnoosh, Dimension reduction big data using recognition of data features based on copula function and principal component analysis, Advances in Mathematical Physics 2021 (2021), 1-8.
H. Hotelling, Analysis of a complex of statistical variables into principal components, Journal of Educational Psychology 24(6) (1933), 417-441.
I. T. Jolliffe, Rotation of principal components: choice of normalization constraints, J. Appl. Stat. 22(1) (1995), 29-35.
K. Alboukadel and M. Fabian, factoextra: Extract and Visualize the Results of Multivariate Data Analyses, R package version 1.0.7., 2020.
https://CRAN.R-project.org/package=factoextra.
K. Femmam and S. Femmam, Fast and efficient feature selection method using bivariate copulas, Journal of Advances in Information Technology 13(3) (2022), 301-305.
K. Max, caret: Classification and Regression Training, R Package Version 6.0-92, 2022. https://CRAN.R-project.org/package=caret.
K. Pearson, Principal components analysis, The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science 6(2) (1901), 559-572.
M. Collins, S. Dasgupta and R. E. Schapire, A generalization of principal components analysis to the exponential family, Advances in Neural Information Processing Systems 14 (2001), 617-624.
M. E. Tipping and C. M. Bishop, Probabilistic principal component analysis, J. R. Stat. Soc. Ser. B Stat. Methodol. 61(3) (1999), 611-622.
M. Hofert, I. Kojadinovic, M. Maechler and J. Yan, Copula: Multivariate Dependence with Copulas, R package version 1.0-0, 2020.
https://CRAN.R-project.org/package=copula.
N. B. Erichson, P. Zheng, K. Manohar, S. L. Brunton, J. N. Kutz and A. Y. Aravkin, Sparse principal component analysis via variable projection, SIAM J. Appl. Math. 80(2) (2020), 977-1002.
R. B. Nelsen, An Introduction to Copulas, Springer Science & Business Media, 2007.
R. Vidal, Y. Ma and S. Shankar, Generalized principal component analysis (GPCA), IEEE Transactions on Pattern Analysis and Machine Intelligence 27(12) (2005), 1945-1959.
Y. Zhao and M. Udell, Missing value imputation for mixed data via Gaussian copula, Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2020, pp. 636-646.
Downloads
Published
Issue
Section
License
Copyright (c) 2023 Pushpa Publishing House, Prayagraj, India

This work is licensed under a Creative Commons Attribution 4.0 International License.
____________________________
Attribution: Credit Pushpa Publishing House as the original publisher, including title and author(s) if applicable.
No Derivatives: Modifying or creating derivative works not allowed without written permission.
Contact Pushpa Publishing House for more info or permissions.
Journal Impact Factor: 