CALCULATING THE SAMPLE SIZE FOR ORDINARY LEAST SQUARE ESTIMATION IN PRESENCE OF MULTICOLLINEARITY
Keywords:
multicollinearity, L2 ridge shrinkage method, and ordinary least squares (OLS) method.DOI:
https://doi.org/10.17654/0972361723060Abstract
The relationship between a variable (the response variable) and the scores of several other variables (the independent variables) may be described using multiple linear regression analysis. This study compares the L2 (ridge) shrinkage method and least squares shrinkage method when multicollinearity is present in a dataset across various sample sizes. For different sample sizes $(n = 25, n = 50, n = 200,$ and $n = 1000)$, this process was repeated. The relationship between larger sample sizes and covariance was not linear in the simulated data. The results demonstrated that L2 regression is best and generates parsimonious models in the presence of multicollinearity; the higher the degree of multicollinearity, the smaller the shrinkage parameter. The L2 regularization technique also helps to reduce standard errors of regression coefficients and the prediction error of the generated model. This implies that for every change in the dataset values, there is always an optimal value of the shrinkage parameter that minimizes multicollinearity and produces more stable and reliable regression models. In moderation studies where we would like to keep all of the predictor variables, L2 regularization would be the best alternative. Increasing sample size gives stable results after estimation as it helps to reduce the standard errors of the regression coefficients of the predictor variables. It is also the best method to use for greatly inflated standard errors of OLS regression coefficients. OLS works best for independent samples, but correlated covariates should be handled with modern regression methods (L2).
Received: February 24, 2023
Revised: May 10, 2023
Accepted: June 7, 2023
References
T. O. Olatayo and A. F. Adedotun, On the test and estimate of fractional parameter in AFRIMA model, Applied Mathematical Science 8(96) (2014), 4783 4796. http://dx.doi.org/10.12988/ams.2014.46498
S. Chatterjee, A. S. Hadi and B. Price, Regression Analysis by Examples, 3rd ed., Wiley VCH, New York, 2000. http://dx.doi.org/10.1002/0470055464
D. A. Agunbiade and O. Oyewole, Using the Monte-Carlo method, regression techniques are compared in the presence of multicollinearity and autocorrelation phenomena, Computer Science Journal from the Annals Series XVIII (2020), 70-77.
A. E. Hoerl and R. W. Kennard, Regression on the ridge: biased estimation for nonorthogonal, Technometrics 12(1) (1970), 55-67.
http://dx.doi.org/10.1080/00401706.1970.10488634
O. G. Obadina, O. A. Odusanya and A. F. Adedotun, Ridge estimation’s effectiveness for multiple linear regression with multicollinearity: an investigation using Monte-Carlo simulations, Journal of the Society of Physical Sciences 3 (2021), 278-281. https://doi.org/10.46481/jnsps.2021.304
C. C. Emioma and S. O. Edeki, Stock price prediction using machine learning on least-squares linear regression basis, J. Phys.: Conf. Ser. 1734 (2021), 012058. DOI 10.1088/1742-6596/1734/1/012058
H. Duzan and N. S. B. M. Shariff, Review of techniques and models for ridge regression in the multi-collinearity problem, Applied Sciences Journal 15(3) (2015), 392.
D. Hauser, Using stepwise regression techniques in geographical research has some issues, Le Géographe Canadien/Canadian Geographer 18(2) (1974), 148-158.
Remi J. Dare, Olumide S. Adesina, Pelumi E. Oguntunde and Olasunmbo O. Agboola, Adaptive regression model for highly skewed count data, International Journal of Mechanical Engineering and Technology 10(1) (2019), 1964-1972.
I. H. Okagbue, E. P. Oguntunde, C. M. Emmanuela and M. A. Elvir, Trends and usage pattern of SPSS and Minitab software in scientific research, Journal of Physics: Conference Series 1734 (2021), 012017.
doi:10.1088/1742-6596/1734/1/012017
P. Filzmoser and C. Croux, Explanatory variable dimension reduction in multiple linear regression, Pliska Studia Mathematica Bulgarica 14(1) (2003), 59-70.
R. Grewal, H. Baumgartner and J. A. Cote, Implications for theory testing from multicollinearity and measurement error in structural equation models, Science of Marketing 23(4) (2004), 519-529.
S.-H. Lee, H.-S. Park and J.-H. Lee and C.-H. Jun, The Selection of Variables and Quality Prediction using Partial Least Squares Regression, Published in Computers and Industrial Engineering, CIE, International Conference, IEEE, 2009, pp. 1302-1307.
I.-G. Chong and C.-H. Jun, Performance of some variable selection techniques in the presence of multicollinearity, Chemometrics and Intelligent Laboratory Systems 78(1-2) (2005), 103-112.
S. C. Ludvigson and S. Ng, A factor analysis of bond risk premia, Technical report, National Bureau of Economic Research, 2009.
Downloads
Published
Issue
Section
License
Copyright (c) 2023 Pushpa Publishing House, Prayagraj, India

This work is licensed under a Creative Commons Attribution 4.0 International License.
____________________________
Attribution: Credit Pushpa Publishing House as the original publisher, including title and author(s) if applicable.
No Derivatives: Modifying or creating derivative works not allowed without written permission.
Contact Pushpa Publishing House for more info or permissions.
Journal Impact Factor: 