Advances and Applications in Statistics

The Advances and Applications in Statistics is an internationally recognized journal indexed in the Emerging Sources Citation Index (ESCI). It provides a platform for original research papers and survey articles in all areas of statistics, both computational and experimental in nature.

Submit Article

CLUSTERING OF COUNT DATA USING POISSON DISTRIBUTION

Authors

  • S. Krishnamoorthy
  • B. Jaganathan

Keywords:

model-based clustering, information criteria, expectation - maximization algorithm, Poisson distribution

DOI:

https://doi.org/10.17654/0972361725053

Abstract

Cluster analysis is often used to identify homogeneous groups within complex datasets, particularly when traditional distance-based methods struggle with high-dimensional or skewed data. In this study, we propose a model-based clustering approach for count data using a finite mixture of Poisson distributions. The model accounts for overdispersion and skewness, with parameters estimated via the expectation-maximization (EM) algorithm. Information criteria such as AIC and BIC are employed for model selection. A key novelty of this work lies in applying Poisson mixture models to a large-scale health survey dataset, specifically the behavioral risk factor surveillance system (BRFSS), treating BMI as discrete count data.  The proposed method is also benchmarked against lognormal mixture models, demonstrating superior performance in terms of misclassification rate and adjusted rand index (ARI). Additionally, the impact of initialization strategies on EM convergence is examined using both real and simulated datasets. Results confirm that Poisson mixture-based clustering offers a more effective and interpretable solution for count data than traditional approaches.

Received: September 28, 2024
Accepted: May 22, 2025

References

B. S. Everitt and D. J. Hand, Finite Mixture Distributions, Chapman & Hall, London, 1981. https://doi.org/10.1007/978-94-009-5897-5.

I. C. Gormley, T. B. Murphy and A. E. Raftery, Model-based clustering, Annual Review of Statistics and its Application 10 (2023), 573-595.

https://doi.org/10.1146/annurev-statistics-033121-115326.

D. Karlis and L. Meligkotsidou, Finite mixtures of multivariate Poisson distributions with application, Journal of Statistical Planning and Inference 137(6) (2007), 1942-1960. https://doi.org/10.1016/j.jspi.2006.07.001.

V. Melnykov and R. Maitra, Finite mixture models and model-based clustering, Statistics Surveys, Statist. Surv. 4(none), 2010.

https://doi.org/10.1146/annurev-statistics-033121-115326.

Y. Pan, J. T. Landis, R. Moorad, D. Wu, J. S. Marron and D. P. Dittmer, The Poisson distribution model fits UMI-based single-cell RNA-sequencing data, BMC Bioinformatics 24(1) (2023), 256.

https://doi.org/10.1186/s12859-023-05349-2.

R. K. Sheth, The generalized Poisson distribution and a model of clustering from Poisson initial conditions, Monthly Notices of the Royal Astronomical Society 299(1) (1998), 207-217.

https://ui.adsabs.harvard.edu/link_gateway/1998MNRAS.299..207S/doi:10.1046 /j.1365-8711.1998.01756.x.

A. Silva, S. J. Rothstein, P. D. McNicholas and S. Subedi, A multivariate Poisson-log normal mixture model for clustering transcriptome sequencing data, BMC Bioinformatics 20(1) (2019), 1-11.

https://doi.org/10.1186/s12859-019-2916-0.

N. Wang, Y. Wang, H. Hao, L. Wang, Z. Wang, J. Wang and R. Wu, A bi-Poisson model for clustering gene expression profiles by RNA-seq, Briefings in Bioinformatics 15(4) (2014), 534-541. https://doi.org/10.1093/bib/bbt029.

D. M. Witten, Classification and clustering of sequencing data using a Poisson model, Annals of Applied Statistics 5(4) (2011), 2493-2518.

https://doi.org/10.48550/arXiv.1202.6201.

Published

07-07-2025

Issue

Section

Articles

How to Cite

CLUSTERING OF COUNT DATA USING POISSON DISTRIBUTION. (2025). Advances and Applications in Statistics , 92(8), 1183-1200. https://doi.org/10.17654/0972361725053

Similar Articles

1-10 of 275

You may also start an advanced similarity search for this article.