In information theory, the cross entropy between two probability distributions measures the overall difference between the two distributions. Cross entropy is closely related to Kullback-Leibler divergence (which is also known as the relative entropy).
The cross entropy for two distributions p and q over the same probability space is defined as follows:
,
where H(p) is the entropy of p and KL is the Kullback-Leibler divergence.
For discrete p and q this means
The situation for continuous distributions is analogous:
NB: The notation H(p,q) is sometimes used for both the cross entropy as well as the joint entropy of p and q.
When comparing a distribution q against a fixed reference distribution p, cross entropy and KL divergence are essentially the same concept. In fact, they are identical up to an additive constant (since p is fixed): both take on their minimal values when p = q, which is 0 for KL divergence, and H(p) for cross entropy.
See also