Abstract:
Overdispersion, often associated with count data is difficult to handle by a single
parameter regression model such as the Poisson regression model. Previous attempts to
modify the Poisson regression model with additional parameters did not take
cognisance of the different levels of overdispersion because there might be no need for
modification at-times. Modification done without any need affects the standard error
leading to wrong conclusions. Therefore, this study was aimed at determining the
threshold for modification in some count data models when the problem of
overdispersion is unavoidable.
Fuzzy 𝑐-partition was used to classify the degree of overdispersion severity into not
severe, moderate, severe, and very severe. Membership function was constructed for
each of the classes with its fuzzy dispersion percentage (𝑑) range: 0 for not severe with
𝑑 β€ 10, (4𝑑β40)
210
for moderate with 10 < 𝑑 β€ 40, 𝑑/70 for severe with 40 < 𝑑 β€ 70
and 1 for very severe with 𝑑 > 70. The universal set of the dispersion percentage,
𝐷 = (𝑣β𝑚𝑚) Γ 100%, where 𝑣 is the variance and 𝑚, the mean. Four models: Poisson
(PO), Negative Binomial (NB), Com-Poisson (CP), and Generalised Poisson (GP)
were used to simulate the benchmark for modification. Different random sample sizes,
including 𝑛 = 20 for small sample and 𝑛 = 5000 for large sample were used with
mean (Β΅) = 0.01, 0.05, 1.00, 2.00 and variance (Ο2) = 0.05, 0.50, 1.50, 2.50,
respectively. The ratio of the residual deviance of PO (simplest model) to its degree of
freedom was used to detect the presence of overdispersion in the count data. The
averaging method was used to determine the threshold ( 𝐷Μ
). The models were
validated with monthly road crashes data from the Federal Road Safety Corps in 36
states and the Federal Capital Territory of Nigeria between 2014-2018 and the Akaike
Information Criteria (AIC) was used for model selection.
The threshold 𝐷Μ
for models PO, NB, CP and GP given that 𝑛 = 20, were 24.2, 69.4,
34.8 and 32.6%; 26.6, 73.6, 26.5 and 27.1%; 23.1, 75.2, 25.1 and 37.1%; 30.4, 77.5,
54.9 and 24.5%, respectively. The highest 𝐷Μ
, at different values of Β΅ and Ο2 for PO,
NB, CP and GP when 𝑛 = 20 were 30.4, 77.5, 54.9 and 37.1%, respectively. For n=
5000, 𝐷Μ
were 27.7, 74.9, 22.1 and 28.3%; 27.6, 74.5, 22.2 and 28.9%; 27.9, 38.2,
22.2 and 29.2%; 28.2, 29.1, 22.2 and 28.3%, respectively. The highest 𝐷Μ
, at different
values of Β΅ and Ο2 for PO, NB, CP and GP when 𝑛 = 5000 were 28.2, 74.9, 22.2
and 29.2%, respectively, indicating points for modifications. The ratio of the residual
deviance of PO to its degree of freedom is 42.0 flagging very severe overdispersion
(95.5%) of road crashes having membership function of 1. The AIC for PO, NB, CP
and GP were 8826.7, 8657.6, 2211.0 and 2205.4, respectively. This implies that GP is
the best model.
The thresholds for modification of severity of overdispersion for Poisson, Negative
Binomial, Com-Poisson, and Generalised Poisson models were determined. The
determined thresholds could be used to minimise wrong conclusions arising from
defective standard errors.