BETA DIVERGENCE AND KERNELS METHODS WITH SUPPORT VECTOR MACHINE (SVM)

Mactar Ndaw; Macoumba Ndour; Papa Ngom

doi:10.17654/0972086323006

Authors

Mactar Ndaw
Macoumba Ndour
Papa Ngom

Keywords:

Hilbertian metrics, positive definite (pd) kernels, divergence, support vector machine (SVM).

DOI:

https://doi.org/10.17654/0972086323006

Abstract

In the field of statistical modelling, the distance or divergence measure is a criterion widely known and used tool for theoretical and applied statistical inference and data processing problems. In this paper, we deal with the well-known beta-divergences (referred to as $\beta$-divergences), which are a family of cost functions parametrized by one hyperparameter and its tight connections with the notions of Hilbertian metrics and positive definite (pd) kernels on probability measures. An attempt is made to describe this dissimilarity measure, which can be symmetrized using two relationships. We compute the degree of symmetry of the $\beta$-divergence on the basis of Hilbertian metrics. We investigate the desirable properties that the proposed approach needs to build a positive definite kernel corresponding to this symmetric $\beta$-divergence, and establish the effectiveness of our approach with experiments conducted on support vector machine (SVM).
We perform experiments using the conditionally defined positive $K$ and the kernel transformed $K^{(\beta)}$ and show that these kernels have the same proportion of errors for the Jeffrey divergence and the chi-square divergence.

Received: September 27, 2022
Accepted: December 26, 2022
Published: May 8, 2023

References

Andrzej Cichocki and Shun-ichi Amari, Families of alpha-beta- and gamma- divergences: flexible and robust measures of similarities, Entropy 12 (2010), 1532-1568. https://doi.org/10.3390/e12061532.

Andrzej Cichocki, Sergio Cruces and Shun-ichi Amari, Generalized alpha-beta divergences and their application to robust nonnegative matrix factorization, Entropy 13 (2011), 134-170. https://doi.org/10.3390/e13010134.

H. Akaike, Information theory and an extension of the maximum likelihood principle, Proceedings of the Second International Symposium on Information Theory Akademiai Kaido, Budapest, B. N. Petrov, I. F. Csaki, eds., 1973, pp. 267-281.

A. Alzaid and K. S. Sultan, Discriminating between gamma and log-normal distributions with applications, Journal of King Saud University - Science 21(2) (2009), 99-108.

Asa Ben-Hur and Willian Stafford Noble, Kernel methods for predicting protein- protein interaction, Bioinformatics 21 (2005), i38-i46.

A. Basu, I. R. Harris, N. L. Hjort and M. C. Jones, Robust and efficient estimation by minimising a density power divergence, Biometrika 85(3) (1998), 549-559.

Bromideh Ali Akbar and Valizadeh Reza, Discrimination between gamma and log-normal distributions by ratio of minimized Kullback-Leibler divergence, Pakistan Journal of Statistics and Operation Research 9(4) (2013), 441-451.

B. Scholkopf and A. J. Smola, Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond, The MIT Press, 2002.

B. Fuglede, Spirals in Hilbert space. With an application in information theory, Expositiones Mathematicae 23 (2005), 23-45.

Arabin Kumar Dey and Debasis Kundu, Discriminating between the Weibull and log-normal distributions for type-II censored data, Journal of Theoretical and Applied Statistics 46(2) (2012), 197-214, doi:10.1080/02331888.2010.504990.

Jens Peter Reus Christensen, Christian Berg and Paul Ressel, Harmonic Analysis on Semigroups, Theory of Positive Definite and Related Function, Springer, New York, NY, 1984. http://doi.org/10.1007/978-1-4612-1128-0.

Arabin Kumar Dey and Debasis Kundu, Discriminating between the log-normal and log-logistic distributions, Communication in Statistics - Theory and Methods 39(2) (2010), 280-292. doi:10.1080/03610920902737100.

A. Diedhiou and P. Ngom, Cutoff time based on generalized divergence measure, Statistics and Probability Letters 79(10) (2009), 1343-1350. doi:10.1016/j.spl.2009.02.006.

L. Devroye and L. Gyorfi, Nonparametric density estimation, Wiley Series in Probability and Mathematical Statistics: Tracts on Probability and Statistics, John Wiley and Sons, Inc., New York, 1985, 368 pp.

U. Einmahl and D. M. Mason, An empirical process approach to the uniform consistency of kernel-type function estimators, J. Theoret. Probab. 13(1) (2000), 1-37.

U. Einmahl and D. M. Mason, Uniform in bandwidth consistency of kernel-type function estimators, Ann. Statist. 33(3) (2005), 1380-1403.

Elsherpieny, Elsayed A. Muhammed, Hiba Z. Radwan and U. Noha, Discriminating between Weibull and log-logistic distributions in presence of progressive type II censoring, 2015, pp. 7281-7290.

F. Itakura and S. Saito, Analysis synthesis telephony based upon maximum likelihood method, Repts. of the 6th International Cong. Acoust., Y. Kohasi, ed., Tokyo, C-5-5, C17-20, 1968.

F. Topsoe, Jenson-Shannon Divergence and Norm-based Measures of Discrimination and Variation, Københavns Universitet, 2003, 32 pp.

Inderjit S. Dhillon and Suvrit Sra, Generalized Nonnegative Matrix Approximations with Bregman Divergences, Vol. 18, 2005.

I. J. Schoenberg, Metric space and positive definite function, Trans. Amer. Math. Soc. 44 (1938), 522-536.

D. Kundu and A. Manglick, Discriminating between the Weibull and log-normal distributions, Naval Research Logistics 51 (2004), 893-905.

Debasis Kundu and Anubhav Manglick, Discriminating between the log-normal and gamma distributions, Journal of the Applied Statistical Sciences 14 (2005), 175-187.

S. Konishi and G. Kitagawa, Generalised information criteria in model selection, Biometrika 83 (1996), 875-890.

F. Liese and I. Vajda, Convex statistical distances, Teubner-Texte zur Mathematik Teubner Texts in Mathematics 95 (1987), 1-85.

Mactar Ndaw, Macoumba Ndour and Papa Ngom, The alpha-beta-symmetric divergence and their positive definite kernels, Journal of Mathematical Sciences: Advances and Applications 53 (2018), 75-100.

Macoumba Ndour, Mactar Ndaw and Papa Ngom, Nonnegative matrix factorization and log-determinant divergences, Proceeding of the Second NLAGA-BIRS Symposium, Cap Skiring, Sénégal, 25-30 January, 2022.

Macoumba Ndour, Mactar Ndaw and Papa Ngom, Generalization of beta-divergence by Bregman divergence and applications, Far East Journal of Theoretical Statistics 57(1) (2019), 15-45.

Papa Ngom and B. Ntep, Minimum penalized Hellinger distance for model selection in small samples, Open Journal of Statistics 2 (2012), 369-382.

doi:10.4236/ojs.2012.24045. http://www.SciRP.org/journal/ojs).

M. Hein and O. Bousquet, Maximal margin classification for metric spaces, Computational Learning Theory and Kernel Machines, 16th Annual Conference on Computational Learning Theory and 7th Kernel Workshop, COLT/Kernel 2003, Washington, DC, USA, 24-27 August, 2003.

Papa Ngom, Selected estimated models with -divergence statistics, Global Journal of Pure and Applied Mathematics 3(1) (2007), 47-61.

Papa Ngom, Hamza Dhaker, El. Hadji Deme and Pierre Mendy, Kernel-type estimators of divergence measures and its strong uniform consistency, American Journal of Theoretical and Applied Statistics 5(1) (2016), 13-22. https://hal.inria.fr/hal-01207481.

Pengwen Chen, Yunmei Chen and Murali Rao, Metric defined by Bregman divergences: Part 2, Commun. Math. Sci. 6(4) (2008), 927-948.

Q. H. Vuong, Likelihood ratio tests for model selection and non-nested hypotheses, Econometrika 57(2) (1989), 257-306. doi:10.2307/1912557.

Solomon Kullback and Richard A. Leibler, On information and sufficiency, The Annals of Mathematical Statistics 22(1) (1951), 79-86.

V. N. Vapnik, Statistical Learning Theory, Wiley-Interscience, 1998.

V. N. Vapnik, The Nature of Statistical Learning Theory, Springer-Verlag, New York, Inc., NY, USA, 1995, pp. 79-82.

Wolfgang Stummer and Igor Vajda, On Bregman distances and divergences of probability measures, IEEE Transactions on Information Theory 58 (2012), 1277-1288.

Xiaoran Xie and Jingjing Wu, Some improvement on convergence rates of kernel density estimator, Appl. Math. 5 (2014), 1684-1696. http://www.scirp.org/journal/am; http://dx.doi.org/10.4236/am.2014.511161.

C. Cortes and V. Vapnik, Support-vector networks, Machine Learning 20 (1995), 273-297.

Article Stats:

Far East Journal of Theoretical Statistics

BETA DIVERGENCE AND KERNELS METHODS WITH SUPPORT VECTOR MACHINE (SVM)

Authors

Keywords:

DOI:

Abstract

References

Downloads

Published

Issue

Section

License

How to Cite

Similar Articles

Most read articles by the same author(s)

Quick Links

Important Links:

Recently Published:

Pushpa Publishing House

QUICKLINKS

SERVICES

Frequently Asked Questions (FAQ)

Quick Links

Article Stats:

Far East Journal of Theoretical Statistics

BETA DIVERGENCE AND KERNELS METHODS WITH SUPPORT VECTOR MACHINE (SVM)

Authors

Keywords:

DOI:

Abstract

References

Downloads

Published

Issue

Section

License

How to Cite

Similar Articles

Most read articles by the same author(s)

Quick Links

Important Links:

Recently Published: