Bhattacharyya distance

In statistics, the Bhattacharyya distance measures the similarity of two probability distributions. It is closely related to the Bhattacharyya coefficient, which is a measure of the amount of overlap between two statistical samples or populations.

It is not a metric, despite named a "distance", since it does not obey the triangle inequality.

History

Both the Bhattacharyya distance and the Bhattacharyya coefficient are named after Anil Kumar Bhattacharyya, a statistician who worked in the 1930s at the Indian Statistical Institute.[1] He developed the method to measure the distance between two non-normal distributions and illustrated this with the classical multinomial populations[2] as well as probability distributions that are absolutely continuous with respect to the Lebesgue measure.[3] The latter work appeared partly in 1943 in the Bulletin of the Calcutta Mathematical Society,[3] while the former part, despite being submitted for publication in 1941, appeared almost five years later in Sankhya.[2][1]

Definition

For probability distributions $P$ and $Q$ on the same domain ${\mathcal {X}}$ , the Bhattacharyya distance is defined as

D_{B}(P,Q)=-\ln \left(BC(P,Q)\right)

where

BC(P,Q)=\sum _{x\in {\mathcal {X}}}{\sqrt {P(x)Q(x)}}

is the Bhattacharyya coefficient for discrete probability distributions.

For continuous probability distributions, with $P(dx)=p(x)dx$ and $Q(dx)=q(x)dx$ where $p(x)$ and $q(x)$ are the probability density functions, the Bhattacharyya coefficient is defined as

BC(P,Q)=\int _{\mathcal {X}}{\sqrt {p(x)q(x)}}\,dx

.

More generally, given two probability measures $P,Q$ on a measurable space $({\mathcal {X}},{\mathcal {B}})$ , let $\lambda$ be a (sigma finite) measure such that $P$ and $Q$ are absolutely continuous with respect to $\lambda$ i.e. such that $P(dx)=p(x)\lambda (dx)$ , and $Q(dx)=q(x)\lambda (dx)$ for probability density functions $p,q$ with respect to $\lambda$ defined $\lambda$ -almost everywhere. Such a measure, even such a probability measure, always exists, e.g. $\lambda ={\tfrac {1}{2}}(P+Q)$ . Then define the Bhattacharyya measure on $({\mathcal {X}},{\mathcal {B}})$ by

bc(dx|P,Q)={\sqrt {p(x)q(x)}}\,\lambda (dx)={\sqrt {{\frac {P(dx)}{\lambda (dx)}}(x){\frac {Q(dx)}{\lambda (dx)}}(x)}}\lambda (dx).

It does not depend on the measure $\lambda$ , for if we choose a measure $\mu$ such that $\lambda$ and an other measure choice $\lambda '$ are absolutely continuous i.e. $\lambda =l(x)\mu$ and $\lambda '=l'(x)\mu$ , then

P(dx)=p(x)\lambda (dx)=p'(x)\lambda '(dx)=p(x)l(x)\mu (dx)=p'(x)l'(x)\mu (dx)

,

and similarly for $Q$ . We then have

bc(dx|P,Q)={\sqrt {p(x)q(x)}}\,\lambda (dx)={\sqrt {p(x)q(x)}}\,l(x)\mu (x)={\sqrt {p(x)l(x)q(x)\,l(x)}}\mu (dx)={\sqrt {p'(x)l'(x)q'(x)l'(x)}}\,\mu (dx)={\sqrt {p'(x)q'(x)}}\,\lambda '(dx)

.

We finally define the Bhattacharyya coefficient

BC(P,Q)=\int _{\mathcal {X}}bc(dx|P,Q)=\int _{\mathcal {X}}{\sqrt {p(x)q(x)}}\,\lambda (dx)

.

By the above, the quantity $BC(P,Q)$ does not depend on $\lambda$ , and by the Cauchy inequality $0\leq BC(P,Q)\leq 1$ . In particular if $P(dx)=p(x)Q(dx)$ is absolutely continuous wrt to $Q$ with Radon Nikodym derivative $p(x)={\frac {P(dx)}{Q(dx)}}(x)$ , then

BC(P,Q)=\int _{\mathcal {X}}{\sqrt {p(x)}}Q(dx)=\int _{\mathcal {X}}{\sqrt {\frac {P(dx)}{Q(dx)}}}Q(dx)=E_{Q}\left[{\sqrt {\frac {P(dx)}{Q(dx)}}}\right]

Properties

$0\leq BC\leq 1$ and $0\leq D_{B}\leq \infty$ .

$D_{B}$ does not obey the triangle inequality, though the Hellinger distance ${\sqrt {1-BC(p,q)}}$ does.

Let $p\sim {\mathcal {N}}(\mu _{p},\sigma _{p}^{2})$ , $q\sim {\mathcal {N}}(\mu _{q},\sigma _{q}^{2})$ , where ${\mathcal {N}}(\mu ,\sigma ^{2})$ is the normal distribution with mean $\mu$ and variance $\sigma ^{2}$ ; then

D_{B}(p,q)={\frac {1}{4}}{\frac {(\mu _{p}-\mu _{q})^{2}}{\sigma _{p}^{2}+\sigma _{q}^{2}}}+{\frac {1}{2}}\ln \left({\frac {\sigma _{p}^{2}+\sigma _{q}^{2}}{2\sigma _{p}\sigma _{q}}}\right)

.

And in general, given two multivariate normal distributions $p_{i}={\mathcal {N}}({\boldsymbol {\mu }}_{i},\,{\boldsymbol {\Sigma }}_{i})$ ,

D_{B}(p_{1},p_{2})={1 \over 8}({\boldsymbol {\mu }}_{1}-{\boldsymbol {\mu }}_{2})^{T}{\boldsymbol {\Sigma }}^{-1}({\boldsymbol {\mu }}_{1}-{\boldsymbol {\mu }}_{2})+{1 \over 2}\ln \,\left({\det {\boldsymbol {\Sigma }} \over {\sqrt {\det {\boldsymbol {\Sigma }}_{1}\,\det {\boldsymbol {\Sigma }}_{2}}}}\right)

,

where ${\boldsymbol {\Sigma }}={{\boldsymbol {\Sigma }}_{1}+{\boldsymbol {\Sigma }}_{2} \over 2}.$ [4] Note that the first term is a squared Mahalanobis distance.

Applications

The Bhattacharyya coefficient quantifies the "closeness" of two random statistical samples.

Given two sequences from distributions $P,Q$ , bin them into $n$ buckets, and let the frequency of samples from $P$ in bucket $i$ be $p_{i}$ , and similarly for $q_{i}$ , then the sample Bhattacharyya coefficient is

BC(\mathbf {p} ,\mathbf {q} )=\sum _{i=1}^{n}{\sqrt {p_{i}q_{i}}},

which is an estimator of $BC(P,Q)$ . The quality of estimation depends on the choice of buckets; too few buckets would overestimate $BC(P,Q)$ , while too many would underestimate.

A common task in classification is estimating the separability of classes. Up to a multiplicative factor, the squared Mahalanobis distance is a special case of the Bhattacharyya distance when the two classes are normally distributed with the same variances. When two classes have similar means but significantly different variances, the Mahalanobis distance would be close to zero, while the Bhattacharyya distance would not be.

The Bhattacharyya coefficient is used in the construction of polar codes.[5]

The Bhattacharyya distance is used in feature extraction and selection,[6] image processing,[7] speaker recognition,[8] and phone clustering.[9]

References

Sen, Pranab Kumar (1996). "Anil Kumar Bhattacharyya (1915-1996): A Reverent Remembrance". Calcutta Statistical Association Bulletin. 46 (3–4): 151–158. doi:10.1177/0008068319960301. S2CID 164326977.
Bhattacharyya, A. (1946). "On a Measure of Divergence between Two Multinomial Populations". Sankhyā. 7 (4): 401–406. JSTOR 25047882.
Bhattacharyya, A. (March 1943). "On a measure of divergence between two statistical populations defined by their probability distributions". Bulletin of the Calcutta Mathematical Society. 35: 99–109. MR 0010358.
Kashyap, Ravi (2019). "The Perfect Marriage and Much More: Combining Dimension Reduction, Distance Measures and Covariance". Physica A: Statistical Mechanics and its Applications. 536: 120938. doi:10.1016/j.physa.2019.04.174.
Arıkan, Erdal (July 2009). "Channel polarization: A method for constructing capacity-achieving codes for symmetric binary-input memoryless channels". IEEE Transactions on Information Theory. 55 (7): 3051–3073. arXiv:0807.3917. doi:10.1109/TIT.2009.2021379. S2CID 889822.
Euisun Choi, Chulhee Lee, "Feature extraction based on the Bhattacharyya distance", Pattern Recognition, Volume 36, Issue 8, August 2003, Pages 1703–1709
François Goudail, Philippe Réfrégier, Guillaume Delyon, "Bhattacharyya distance as a contrast parameter for statistical processing of noisy optical images", JOSA A, Vol. 21, Issue 7, pp. 1231−1240 (2004)
Chang Huai You, "An SVM Kernel With GMM-Supervector Based on the Bhattacharyya Distance for Speaker Recognition", Signal Processing Letters, IEEE, Vol 16, Is 1, pp. 49-52
Mak, B., "Phone clustering using the Bhattacharyya distance", Spoken Language, 1996. ICSLP 96. Proceedings., Fourth International Conference on, Vol 4, pp. 2005–2008 vol.4, 3−6 Oct 1996

Nielsen, F.; Boltz, S. (2010). "The Burbea–Rao and Bhattacharyya centroids". IEEE Transactions on Information Theory. 57 (8): 5455–5466. arXiv:1004.5049. doi:10.1109/TIT.2011.2159046. S2CID 14238708.

Kailath, T. (1967). "The Divergence and Bhattacharyya Distance Measures in Signal Selection". IEEE Transactions on Communication Technology. 15 (1): 52–60. doi:10.1109/TCOM.1967.1089532.

Djouadi, A.; Snorrason, O.; Garber, F. (1990). "The quality of Training-Sample estimates of the Bhattacharyya coefficient". IEEE Transactions on Pattern Analysis and Machine Intelligence. 12 (1): 92–97. doi:10.1109/34.41388.

For a short list of properties, see: http://www.mtm.ufsc.br/~taneja/book/node20.html

External links

This article is issued from Wikipedia. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.

[:0-1] Sen, Pranab Kumar (1996). "Anil Kumar Bhattacharyya (1915-1996): A Reverent Remembrance". Calcutta Statistical Association Bulletin. 46 (3–4): 151–158. doi:10.1177/0008068319960301. S2CID 164326977.

[:1-2] Bhattacharyya, A. (1946). "On a Measure of Divergence between Two Multinomial Populations". Sankhyā. 7 (4): 401–406. JSTOR 25047882.

[:2-3] Bhattacharyya, A. (March 1943). "On a measure of divergence between two statistical populations defined by their probability distributions". Bulletin of the Calcutta Mathematical Society. 35: 99–109. MR 0010358.

[4] Kashyap, Ravi (2019). "The Perfect Marriage and Much More: Combining Dimension Reduction, Distance Measures and Covariance". Physica A: Statistical Mechanics and its Applications. 536: 120938. doi:10.1016/j.physa.2019.04.174.

[5] Arıkan, Erdal (July 2009). "Channel polarization: A method for constructing capacity-achieving codes for symmetric binary-input memoryless channels". IEEE Transactions on Information Theory. 55 (7): 3051–3073. arXiv:0807.3917. doi:10.1109/TIT.2009.2021379. S2CID 889822.

[6] Euisun Choi, Chulhee Lee, "Feature extraction based on the Bhattacharyya distance", Pattern Recognition, Volume 36, Issue 8, August 2003, Pages 1703–1709

[Goudail-7] François Goudail, Philippe Réfrégier, Guillaume Delyon, "Bhattacharyya distance as a contrast parameter for statistical processing of noisy optical images", JOSA A, Vol. 21, Issue 7, pp. 1231−1240 (2004)

[You-8] Chang Huai You, "An SVM Kernel With GMM-Supervector Based on the Bhattacharyya Distance for Speaker Recognition", Signal Processing Letters, IEEE, Vol 16, Is 1, pp. 49-52

[Mak-9] Mak, B., "Phone clustering using the Bhattacharyya distance", Spoken Language, 1996. ICSLP 96. Proceedings., Fourth International Conference on, Vol 4, pp. 2005–2008 vol.4, 3−6 Oct 1996