CLUSTERING OF COUNT DATA USING POISSON DISTRIBUTION

S. Krishnamoorthy; B. Jaganathan

doi:10.17654/0972361725053

Authors

S. Krishnamoorthy
B. Jaganathan

Keywords:

model-based clustering, information criteria, expectation - maximization algorithm, Poisson distribution

DOI:

https://doi.org/10.17654/0972361725053

Abstract

Cluster analysis is often used to identify homogeneous groups within complex datasets, particularly when traditional distance-based methods struggle with high-dimensional or skewed data. In this study, we propose a model-based clustering approach for count data using a finite mixture of Poisson distributions. The model accounts for overdispersion and skewness, with parameters estimated via the expectation-maximization (EM) algorithm. Information criteria such as AIC and BIC are employed for model selection. A key novelty of this work lies in applying Poisson mixture models to a large-scale health survey dataset, specifically the behavioral risk factor surveillance system (BRFSS), treating BMI as discrete count data. The proposed method is also benchmarked against lognormal mixture models, demonstrating superior performance in terms of misclassification rate and adjusted rand index (ARI). Additionally, the impact of initialization strategies on EM convergence is examined using both real and simulated datasets. Results confirm that Poisson mixture-based clustering offers a more effective and interpretable solution for count data than traditional approaches.

Received: September 28, 2024
Accepted: May 22, 2025

References

B. S. Everitt and D. J. Hand, Finite Mixture Distributions, Chapman & Hall, London, 1981. https://doi.org/10.1007/978-94-009-5897-5.

I. C. Gormley, T. B. Murphy and A. E. Raftery, Model-based clustering, Annual Review of Statistics and its Application 10 (2023), 573-595.

https://doi.org/10.1146/annurev-statistics-033121-115326.

D. Karlis and L. Meligkotsidou, Finite mixtures of multivariate Poisson distributions with application, Journal of Statistical Planning and Inference 137(6) (2007), 1942-1960. https://doi.org/10.1016/j.jspi.2006.07.001.

V. Melnykov and R. Maitra, Finite mixture models and model-based clustering, Statistics Surveys, Statist. Surv. 4(none), 2010.

https://doi.org/10.1146/annurev-statistics-033121-115326.

Y. Pan, J. T. Landis, R. Moorad, D. Wu, J. S. Marron and D. P. Dittmer, The Poisson distribution model fits UMI-based single-cell RNA-sequencing data, BMC Bioinformatics 24(1) (2023), 256.

https://doi.org/10.1186/s12859-023-05349-2.

R. K. Sheth, The generalized Poisson distribution and a model of clustering from Poisson initial conditions, Monthly Notices of the Royal Astronomical Society 299(1) (1998), 207-217.

https://ui.adsabs.harvard.edu/link_gateway/1998MNRAS.299..207S/doi:10.1046 /j.1365-8711.1998.01756.x.

A. Silva, S. J. Rothstein, P. D. McNicholas and S. Subedi, A multivariate Poisson-log normal mixture model for clustering transcriptome sequencing data, BMC Bioinformatics 20(1) (2019), 1-11.

https://doi.org/10.1186/s12859-019-2916-0.

N. Wang, Y. Wang, H. Hao, L. Wang, Z. Wang, J. Wang and R. Wu, A bi-Poisson model for clustering gene expression profiles by RNA-seq, Briefings in Bioinformatics 15(4) (2014), 534-541. https://doi.org/10.1093/bib/bbt029.

D. M. Witten, Classification and clustering of sequencing data using a Poisson model, Annals of Applied Statistics 5(4) (2011), 2493-2518.

https://doi.org/10.48550/arXiv.1202.6201.

Article Stats:

Advances and Applications in Statistics

CLUSTERING OF COUNT DATA USING POISSON DISTRIBUTION

Authors

Keywords:

DOI:

Abstract

References

Downloads

Published

Issue

Section

License

How to Cite

Similar Articles

Quick Links

Important Links:

Recently Published:

Pushpa Publishing House

QUICKLINKS

SERVICES

Frequently Asked Questions (FAQ)

Quick Links

Article Stats:

Advances and Applications in Statistics

CLUSTERING OF COUNT DATA USING POISSON DISTRIBUTION

Authors

Keywords:

DOI:

Abstract

References

Downloads

Published

Issue

Section

License

How to Cite

Similar Articles

Quick Links

Important Links:

Recently Published: