Poisson binomial distribution
In probability theory and statistics, the Poisson binomial distribution is the discrete probability distribution of a sum of independent Bernoulli trials that are not necessarily identically distributed. The concept is named after Siméon Denis Poisson.
Parameters | — success probabilities for each of the n trials | ||
---|---|---|---|
Support | k ∈ { 0, …, n } | ||
PMF | |||
CDF | |||
Mean | |||
Variance | |||
Skewness | |||
Ex. kurtosis | |||
MGF | |||
CF | |||
PGF |
In other words, it is the probability distribution of the number of successes in a collection of n independent yes/no experiments with success probabilities . The ordinary binomial distribution is a special case of the Poisson binomial distribution, when all success probabilities are the same, that is .
Definitions
Probability Mass Function
The probability of having k successful trials out of a total of n can be written as the sum [1]
where is the set of all subsets of k integers that can be selected from {1,2,3,...,n}. For example, if n = 3, then . is the complement of , i.e. .
will contain elements, the sum over which is infeasible to compute in practice unless the number of trials n is small (e.g. if n = 30, contains over 1020 elements). However, there are other, more efficient ways to calculate .
As long as none of the success probabilities are equal to one, one can calculate the probability of k successes using the recursive formula [2] [3]
where
The recursive formula is not numerically stable, and should be avoided if is greater than approximately 20. An alternative is to use a divide-and-conquer algorithm: if we assume is a power of two, denoting by the Poisson binomial of and the convolution operator, we have .
Another possibility is using the discrete Fourier transform.[4]
where and .
Still other methods are described in "Statistical Applications of the Poisson-Binomial and conditional Bernoulli distributions" by Chen and Liu.[5]
Properties
Mean and Variance
Since a Poisson binomial distributed variable is a sum of n independent Bernoulli distributed variables, its mean and variance will simply be sums of the mean and variance of the n Bernoulli distributions:
For fixed values of the mean () and size (n), the variance is maximal when all success probabilities are equal and we have a binomial distribution. When the mean is fixed, the variance is bounded from above by the variance of the Poisson distribution with the same mean which is attained asymptotically as n tends to infinity.
Entropy
There is no simple formula for the entropy of a Poisson binomial distribution, but the entropy is bounded above by the entropy of a binomial distribution with the same number parameter and the same mean. Therefore, the entropy is also bounded above by the entropy of a Poisson distribution with the same mean.[6]
The Shepp–Olkin concavity conjecture, due to Lawrence Shepp and Ingram Olkin in 1981, states that the entropy of a Poisson binomial distribution is a concave function of the success probabilities .[7] This conjecture was proved by Erwan Hillion and Oliver Johnson in 2015.[8] The Shepp-Olkin monotonicity conjecture, also from the same 1981 paper, is that the entropy is monotone increasing in , if all . This conjecture was also proved by Hillion and Johnson, in 2019 [9]
Chernoff bound
The probability that a Poisson binomial distribution gets large, can be bounded using its moment generating function as follows (valid when and for any ):
where we took . This is similar to the tail bounds of a binomial distribution.
Computational methods
The reference [10] discusses techniques of evaluating the probability mass function of the Poisson binomial distribution. The following software implementations are based on it:
- An R package poibin was provided along with the paper,[10] which is available for the computing of the cdf, pmf, quantile function, and random number generation of the Poisson binomial distribution. For computing the PMF, a DFT algorithm or a recursive algorithm can be specified to compute the exact PMF, and approximation methods using the normal and Poisson distribution can also be specified.
- poibin - Python implementation - can compute the PMF and CDF, uses the DFT method described in the paper for doing so.
See also
References
- Wang, Y. H. (1993). "On the number of successes in independent trials" (PDF). Statistica Sinica. 3 (2): 295–312.
- Shah, B. K. (1994). "On the distribution of the sum of independent integer valued random variables". American Statistician. 27 (3): 123–124. JSTOR 2683639.
- Chen, X. H.; A. P. Dempster; J. S. Liu (1994). "Weighted finite population sampling to maximize entropy" (PDF). Biometrika. 81 (3): 457. doi:10.1093/biomet/81.3.457.
- Fernandez, M.; S. Williams (2010). "Closed-Form Expression for the Poisson-Binomial Probability Density Function". IEEE Transactions on Aerospace and Electronic Systems. 46 (2): 803–817. Bibcode:2010ITAES..46..803F. doi:10.1109/TAES.2010.5461658. S2CID 1456258.
- Chen, S. X.; J. S. Liu (1997). "Statistical Applications of the Poisson-Binomial and conditional Bernoulli distributions". Statistica Sinica. 7: 875–892.
- Harremoës, P. (2001). "Binomial and Poisson distributions as maximum entropy distributions" (PDF). IEEE Transactions on Information Theory. 47 (5): 2039–2041. doi:10.1109/18.930936.
- Shepp, Lawrence; Olkin, Ingram (1981). "Entropy of the sum of independent Bernoulli random variables and of the multinomial distribution". In Gani, J.; Rohatgi, V.K. (eds.). Contributions to probability: A collection of papers dedicated to Eugene Lukacs. New York: Academic Press. pp. 201–206. ISBN 0-12-274460-8. MR 0618689.
- Hillion, Erwan; Johnson, Oliver (2015-03-05). "A proof of the Shepp-Olkin entropy concavity conjecture". Bernoulli. 23 (4B): 3638–3649. arXiv:1503.01570. doi:10.3150/16-BEJ860. S2CID 8358662.
- Hillion, Erwan; Johnson, Oliver (2019-11-09). "A proof of the Shepp-Olkin entropy monotonicity conjecture". Electronic Journal of Probability. 24 (126): 1–14. doi:10.1214/19-EJP380.
- Hong, Yili (March 2013). "On computing the distribution function for the Poisson binomial distribution". Computational Statistics & Data Analysis. 59: 41–51. doi:10.1016/j.csda.2012.10.006.