JP Journal of Biostatistics

The JP Journal of Biostatistics is a highly regarded open-access international journal indexed in the Emerging Sources Citation Index (ESCI). It focuses on the application of statistical theory and methods in resolving problems in biological, biomedical, and agricultural sciences. The journal encourages the submission of experimental papers that employ relevant algorithms and also welcomes survey articles in the fields of biostatistics and epidemiology.

Submit Article

IMPACT OF BOX-COX TRANSFORMATION TECHNIQUE ON THE BAYESIAN MAXIMUM ENTROPY (BME) PREDICTION ACCURACY

Authors

  • Emmanuel Ehnon Gongnet
  • Romaric Vihotogbé
  • Codjo Emile Agbangba
  • Tranquillin Affossogbé
  • Koye Djondang
  • Romain Glèlè Kakaï

Keywords:

Box-Cox transformation, skewness, spatial dependence, Bayesian maximum entropy

DOI:

https://doi.org/10.17654/0973514325006

Abstract

This study investigated whether increasing the normality of an attribute using Box-Cox transformation improves Bayesian Maximum Entropy (BME) prediction accuracy. Furthermore, we examined if BME accuracy is affected by sample size or spatial dependence. For hard data, the unconditional sequential approach was used to simulate symmetric data (skewness = 0) and data positively skewed (skewness: 1, 3, 6, and 9) with sample size ranging from 100 to 500 at the interval length of 50. Soft data was randomly distributed throughout a square of unit size and a width of 1.5. Data was then transformed using Box-Cox transformation. The prediction accuracy was assessed using the Mean Square Error (MSE) and bias, and transformation methods were compared using the Multivariate Analysis of Variance (MANOVA). The results showed BME accuracy is affected by transformation methods but not the sample size and the spatial dependency. However, in comparing the transformed data with the untransformed data, the MSE and bias of the untransformed data (lambda = 1) were closer to zero than the transformed data lambda $\neq$ 1. As a result, we concluded that BME is robust to skewness, sample size, and spatial dependency.

Received: July 13, 2024
Accepted: August 27, 2024

References

S. Manikandan, Data transformation, Journal of Pharmacology and Pharmacotherapeutics 1(2) (2010), 126.

https://ncss-wpengine.netdna-ssl.com/wpcontent/themes/ncss/pdf/Procedures/ NCSS/Box-Cox_Transformation.pdf.

H. Y. Kim, Statistical notes for clinical researchers: assessing normal distribution (2) using skewness and kurtosis, Restorative Dentistry and Endodontics 38(1) (2013), 52-54.

H. Arslan, Determination of temporal and spatial variability of groundwater irrigation quality using geostatistical techniques on the coastal aquifer of Carsamba plain, Turkey, from 1990 to 2012. Environmental Earth Sciences 76(1) (2017), 1-12.

A. Amin, B. Shah, A. M. Khattak, T. Baker and S. Anwar, Just-in-time customer churn prediction: with and without data transformation, 2018 IEEE Congress on Evolutionary Computation (CEC), IEEE, 2018, pp. 1-6.

M. Serre, G. Christakos and S. Lee, Soft data space/time mapping of coarse particulate matter annual arithmetic average over the U.S., geoENV IV -Geostatistics for Environmental Applications, Springer, 2004, pp. 115-126.

B. Rawlins, R. Lark, R. Webster and K. O’Donnell, The use of soil survey data to determine the magnitude and extent of historic metal deposition related to atmospheric smelter emissions across Humberside, UK, Environmental Pollution 143(3) (2006), 416-426.

T. Orton and R. Lark, The Bayesian maximum entropy method for lognormal variables, Stochastic Environmental Research and Risk Assessment 23(3) (2009), 319-328.

G. Christakos, A Bayesian/maximum-entropy view to the spatial estimation problem, Mathematical Geology 22(7) (1990), 763-777.

G. Christakos, Random Field Models in Earth Sciences, Mineola, New York, USA, 2005.

S. Gao, Z. Zhu, S. Liu, R. Jin, G. Yang and L. Tan, Estimating the spatial distribution of soil moisture based on Bayesian maximum entropy method with auxiliary data from remote sensing, International Journal of Applied Earth Observation and Geoinformation 32 (2014), 54-66.

G. Christakos, Spatiotemporal information systems in soil and environmental sciences, Geoderma 85(2-3) (1998), 141-179.

J. He and A. Kolovos, Bayesian maximum entropy approach and its applications: a review, Stochastic Environmental Research and Risk Assessment 32(4) (2018), 859-877.

G. Christakos, P. Bogaert and M. L. Serre, Advanced Functions of Temporal GIS, Springer, 2001.

G. Christakos, On the assimilation of uncertain physical knowledge bases: Bayesian and non-Bayesian techniques, Advances in Water Resources 25(8-12) (2002), 1257-1274.

G. Christakos, Bayesian maximum entropy, M. Kanevski, ed., Advanced Mapping of Environmental Data: Geostatistics, Machine Learning, and Bayesian Maximum Entropy, Wiley, New York, 2008, pp. 247-306.

M. Oliver and R. Webster, A tutorial guide to geostatistics: computing and modelling variograms and kriging, Catena 113 (2014), 56-69.

R. Webster and M. A. Oliver, Geostatistics for Environmental Scientists, John Wiley & Sons, 2007.

R. Webster and M. A. Oliver, Sample adequately to estimate variograms of soil properties, Journal of Soil Science 43(1) (1992), 177-192.

J. Angulo, H.-L. Yu, A. Langousis, A. Kolovos, J. Wang, A. E. Madrid and G. Christakos, Spatiotemporal infectious disease modeling: a BME-sir approach, PloS One 8(9) (2013), e72168.

J. Osborne, Improving your data transformations: applying the Box-Cox transformation, Practical Assessment, Research, and Evaluation 15(1) (2010), 12.

C. L. Wang, S. B. Zhong, G. N. Yao and Q. Y. Huang, BME spatiotemporal estimation of annual precipitation and detection of drought hazard clusters using space-time scan statistics in the Yun-Gui-Guang Region, Mainland China, Journal of Applied Meteorology and Climatology 56(8) (2017), 2301-2316.

M. L. Serre and G. Christakos, Modern geostatistics: computational BME analysis in the light of uncertain physical knowledge - the Equus Beds study, Stochastic Environmental Research and Risk Assessment 13(1) (1999), 1-26.

A. Douaik, M. Van Meirvenne, T. Toth and M. Serre, Space-time mapping of soil salinity using probabilistic Bayesian maximum entropy, Stochastic Environmental Research and Risk Assessment 18(4) (2004), 219-227.

A. Douaik, M. Van Meirvenne and T. Toth, Soil salinity mapping using spatiotemporal kriging and Bayesian maximum entropy with interval soft data, Geoderma 128(3-4) (2005), 234-248.

O. Baydaroglu and K. Kocak, Spatiotemporal analysis of wind speed via the Bayesian maximum entropy approach, Environmental Earth Sciences 78(1) (2019), 1-21.

C. T. Zhang and Y. Yang, Can the spatial prediction of soil organic matter be improved by incorporating multiple regression confidence intervals as soft data into BME method? Catena 178 (2019), 322-334.

G. E. Box and D. R. Cox, An analysis of transformations, Journal of the Royal Statistical Society: Series B (Methodological) 26(2) (1964), 211-243.

F. Zhang, I. Keivanloo and Y. Zou, Data transformation in cross-project defect prediction, Empirical Software Engineering 22(6) (2017), 3186-3218.

B. B. Trangmar, R. S. Yost and G. Uehara, Application of geostatistics to spatial studies of soil properties, Advances in Agronomy 38 (1986), 45-94.

R. Lark, A comparison of some robust estimators of the variogram for use in soil survey, European Journal of Soil Science 51(1) (2000a), 137-157.

R Core Team, R: A language and environment for statistical computing, R Foundation for Statistical Computing, Vienna, Austria, 2020.

URL https://www.R-project.org/.

R. M. Lark, Estimating variograms of soil properties by the method-of-moments and maximum likelihood, European Journal of Soil Science 51(4) (2000b), 717-728.

Z. Jia, S. Zhou, Q. Su, H. Yi and J. Wang, Comparison study on the estimation of the spatial distribution of regional soil metal(loid)s pollution based on kriging interpolation and BP neural network, International Journal of Environmental Research and Public Health 15(1) (2018), 34.

E. Savelieva, V. Demyanov, M. Kanevski, M. Serre and G. Christakos, BME based uncertainty assessment of the Chernobyl fallout, Geoderma 128(3-4) (2005), 312-324.

L. Sanders, A. Daly and K. Regan, Beginning the uncertain journey: Foundation students’ expectations and experience, HEA Annual Conference, 2012.

P. Jat and M. L. Serre, Bayesian maximum entropy space/time estimation of surface water chloride in Maryland using river distances, Environmental Pollution 219 (2016), 1148-1155.

L. Han, C. Wang, Q. Liu, G. Wang, T. Yu, X. Gu and Y. Zhang, Soil moisture mapping based on multi-source fusion of optical, near-infrared, thermal infrared, and digital elevation model data via the Bayesian maximum entropy framework, Remote Sensing 12(23) (2020), 3916.

L. Fan, Q. Xiao, J. Wen, Q. Liu, R. Jin, D. You and X. Li, Mapping high resolution soil moisture over heterogeneous cropland using multi-resource remote sensing and ground observations, Remote Sensing 7(10) (2015), 13273-13297.

A. Famili, W.-M. Shen, R. Weber and E. Simoudis, Data preprocessing and intelligent data analysis, Intelligent Data Analysis 1(1) (1997), 3-23.

J. J. Park, Key-Il Shin and K. Cho, Evaluation of data transformations and validation of a spatial model for spatial dependency of Trialeurodes vaporariorum populations in a cherry tomato greenhouse, Journal of Asia-Pacific Entomology 7(3) (2004), 289-295.

Published

2024-12-10

Issue

Section

Articles

How to Cite

IMPACT OF BOX-COX TRANSFORMATION TECHNIQUE ON THE BAYESIAN MAXIMUM ENTROPY (BME) PREDICTION ACCURACY. (2024). JP Journal of Biostatistics, 25(1), 127-144. https://doi.org/10.17654/0973514325006

Similar Articles

1-10 of 22

You may also start an advanced similarity search for this article.