Maths encyclopedia and lessons  
Search

Mathematics Encyclopedia and Lessons

 
     
 

Lessons

Popular
Subjects

algebra
arithmetic
calculus
equations
geometry
differential equations
trigonometry
number theory
probability theory
more
 

References

applied mathematics
mathematical games
mathematicians
more
 
 

Kullback-Leibler divergence

In probability theory and information theory, the Kullback-Leibler divergence, or relative entropy, is a quantity which measures the difference between two probability distributions. It is named after Solomon Kullback and Richard Leibler , two NSA mathematicians. The term "divergence" is a misnomer; it is not the same as divergence in calculus. One might be tempted to call it a "distance metric", but this would also be a misnomer as the Kullback-Leibler divergence is not symmetric and does not satisfy the triangle inequality.

The Kullback-Leibler divergence between two probability distributions p and q is defined as

\mathit{KL}(p,q) = \sum_x p(x) \log \frac{p(x)}{q(x)} \!

for distributions of a discrete variable, and as

\mathit{KL}(p,q) = \int_{-\infty}^{\infty} p(x) \log \frac{p(x)}{q(x)} \; dx \!

for distributions of a continuous variable.

It can be seen from the definition that

\mathit{KL}(p,q) = -\sum_x p(x) \log q(x) + \sum_x p(x) \log p(x) = H(p,q) - H(p)\, \!

denoting by H(p,q) the cross entropy of p and q, and by H(p) the entropy of p. As the cross-entropy is always greater than or equal to the entropy, this shows that the Kullback-Leibler divergence is nonnegative, and furthermore KL(p,q) is zero iff p = q.

In coding theory, the KL divergence can be interpreted as the needed extra message-length per datum for sending messages distributed as q, if the messages are encoded using a code that is optimal for distribution p.

In Bayesian statistics the KL divergence can be used as a measure of the "distance" between the prior distribution and the posterior distribution. If the logarithms are taken to the base 2 the KL divergence is also the gain in Shannon information involved in going from the prior to the posterior. In Bayesian experimental design a design which is optimised to maximise the KL divergence between the prior and the posterior is said to be Bayes d-optimal .

References

  • S. Kullback and R. A. Leibler. On information and sufficiency. Annals of Mathematical Statistics 22(1):79–86, March 1951.
01-04-2007 01:18:14
The contents of this article are licensed from Wikipedia.org
under the GNU Free Documentation License. How to see transparent copy