💡 Theoretical Discrete
Distributions
n
, p
]: the distribution of the
number of successes, while repeating an experiment of success rate
p
for n
times.p
]: the distribution of the number of
failures before the first success, while repeating an experiment of
success rate p
.n
, p
]: the distribution of the
number of failures before the n
-th success, while repeating
an experiment of success rate p
.::p_load(dplyr, vcd) pacman
HorseKick
: Number of events per year per corps
par(mfrow=c(1,1), cex=0.7)
HorseKicks
nDeaths
0 1 2 3 4
109 65 22 3 1
Test H0:
The data fits Poisson
Distribution
= goodfit(HorseKicks, type = "poisson")
fit summary(fit)
Goodness-of-fit test for poisson distribution
X^2 df P(> X^2)
Likelihood Ratio 0.86822 3 0.83309
p=0.833 > 0.05
:The data is not significantly
different from Poisson.
Parameters:What is the \(\lambda\)?
$par fit
$lambda
[1] 0.61
Application:What is the probability of
nDeath >= 2
1 - ppois(1, fit$par$lambda)
[1] 0.12521
🧙 Discussion:
If an insurance
company want to design an policy for death by horse kick, and you need
to know P[nDeath>5]:
■ Can you estimate the Probability from
the data?
■ Can you estimate it by the model?
■ Which way
is better?
What is the probability of nDeath >= 5
?
1 - ppois(4, fit$par$lambda)
[1] 0.00042497
Data Federalist
:In an set of Federal
Papers, the numbers of “may” per paragraph.
Federalist
nMay
0 1 2 3 4 5 6
156 63 29 8 4 1 1
Test:H0:
The data fits Poisson
Distribution
<- goodfit(Federalist, type = "poisson")
fit summary(fit)
Goodness-of-fit test for poisson distribution
X^2 df P(> X^2)
Likelihood Ratio 25.243 5 0.00012505
H0:
The data fits Negative Binomial Distribution
= goodfit(Federalist, type = "nbinomial")
fit summary(fit)
Goodness-of-fit test for nbinomial distribution
X^2 df P(> X^2)
Likelihood Ratio 1.964 4 0.74238
Parameters:What are the parameters?
$par fit
$size
[1] 1.1863
$prob
[1] 0.64376
NBinom[n=1.19, p=0.64]
.Plot:How does the distribution looks like?
=c(margin=c(3,3,3,1),cex=0.7)
pardnbinom(0:10, fit$par$size, fit$par$prob) %>% barplot(names=0:10)
Estimation:What is the probability that
2 <= nMay <= 6
?
#
💡 Steps to Apply Theoretical
Distribution