SUPERVISED MACHINE LEARNING: A COMPARISON OF POISSON AND NEGATIVE BINOMIAL REGRESSION FOR COUNT DATA ANALYSIS
Keywords:
supervised machine learning, regression, count data, poisson regression, negative binomial regressionDOI:
https://doi.org/10.17654/0972361725040Abstract
This study explores the application of supervised machine learning techniques, specifically Poisson and negative binomial regression models, for analyzing count data to forecast outgoing mail volume for the General Directorate of Posts of Saudi Arabia from 2002 to 2006. The dataset covers 13 administrative regions and consists of 65 observations with 3 variables - the dependent variable is the number of outgoing mails, and the independent variables are year and region. Exploratory data analysis revealed significant overdispersion in the data, with a large number of zero observations. Initial Poisson regression analysis highlighted the model’s limitations in addressing these data characteristics. In contrast, the negative binomial regression model demonstrated superior performance, achieving a lower Mean Absolute Prediction Error (MAPE) of 34,026.7 compared to 34,253.08 for the Poisson model. Additionally, likelihood-based metrics such as the Likelihood Ratio Test, AIC, and BIC consistently indicated that the negative binomial regression model provided a better fit to the data, reflecting the underlying overdispersion. Based on these findings, the negative binomial regression model is recommended as the primary approach for predicting outgoing mail volume for the General Directorate of Posts of Saudi Arabia.
Received: October 27, 2024
Accepted: February 11, 2025
References
J. M. Hilbe, Negative Binomial Regression, Cambridge University Press, New York, 2011.
A. Agresti, Categorical Data Analysis, Cambridge University Press, Hoboken, New Jersey, 2002.
A. J. Dobson and A. G Barnett, An Introduction to Generalized Linear Models, CRC Press, New York, 2008.
A. C. Cameron and P. K. Trivedi, Regression Analysis of Count Data, Cambridge University Press, New York, 2013.
G. James, D. Witten, T. Hastie and R. Tibshirani, An Introduction to Statistical Learning, 1st ed., Springer, New York, 2013.
H. Akaike, Information Theory and an Extension of the Maximum Likelihood Principle, Springer, 1973.
G. Schwarz, Estimating the dimension of a model, Ann. Statist. 6 (1978), 461-464.
General Directorate of Posts, 2023.
Retrieved from https://www.sp.gov.sa/en/about-us/.
G. Buyrukoglu, S. Buyrukoglu and Z. Topalcengiz, Comparing regression models with count data to artificial neural network and ensemble models for prediction of generic Escherichia coli population in agricultural ponds based on weather station measurements, Microbial Risk Analysis 19 (2021), 100171.
Downloads
Published
Issue
Section
License
Copyright (c) 2025 Pushpa Publishing House, Prayagraj, India

This work is licensed under a Creative Commons Attribution 4.0 International License.
____________________________
Attribution: Credit Pushpa Publishing House as the original publisher, including title and author(s) if applicable.
No Derivatives: Modifying or creating derivative works not allowed without written permission.
Contact Pushpa Publishing House for more info or permissions.
Journal Impact Factor: 