Çapraz entropi neden Kullbeck Leibler ıraksama değil, sınıflandırma standart kayıp fonksiyonu haline geldi?

15

Çapraz entropi KL diverjansına ek olarak hedef dağılımın entropisine özdeştir. İki dağılım aynı olduğunda KL sıfıra eşittir, bu da hedef dağılımın entropisinden daha sezgisel görünüyor, bu da çapraz entropinin bir maçta olduğu şey.

İnsan bakış açısının sıfırdan daha sezgisel bulabilmesi dışında birbirinden daha fazla bilgi olduğunu söylemiyorum. Tabii ki, kişi sınıflandırmanın ne kadar iyi gerçekleştiğini görmek için genellikle bir değerlendirme yöntemi kullanır. Ama KL üzerinde çapraz entropi seçimi tarihi mi?

machine-learning classification

— Josh Albert
kaynak

12

When it comes to classification problem in machine learning, the cross entropy and KL divergence are equal. As already stated in the question, the general formula is this:

H (p, q) = H (p) + D_{K L} (p | | q)

$H(p, q) = H(p) + D_{KL}(p||q)$

Where $p$ a “true” distribution and $q$ is an estimated distribution, $H(p, q)$ is the cross-entropy, $H(p)$ is the entropy and $D$ is the Kullback-Leibler divergence.

Note that in machine learning, $p$ is a one-hot representation of the ground-truth class, i.e.,

p = [0, . . ., 1, . . ., 0]

$p = [0,..., 1, ..., 0]$

temelde bir delta işlevi olan distribution. But the entropy of the delta function is zero, hence KL divergence simply equals the cross-entropy.

Aslında, $H(p)$ değildi $0$ (örneğin, yumuşak etiketler) sabittir ve eğime katkısı yoktur. Optimizasyon açısından, basitçe kaldırmak ve Kullback-Leibler sapmasını optimize etmek güvenlidir.

— özdeyiş
kaynak

0

Cross-entropy is an entropy, not an entropy difference.

A more natural and perhaps intuitive way to conceptualize the categorization criteria is through a relation rather than a definition.

$H(P, Q) - H(P) = D_{\mathrm{KL}}(P\|Q) = - \sum_i P(i) \log\frac{Q(i)}{P(i)}$

This follows parallels, identified by Claude Shannon with John von Neumann, between quantum mechanical thermodynamics and information theory. Entropy is not an absolute quantity. It is a relative one, so neither entropy nor cross entropy can be calculated, but their difference can be for either the discrete case above or its continuous sibling below.

$H(P, Q) - H(P) = D_{\mathrm{KL}}(P\|Q) = - \int_{-\infty}^\infty \, p(x) \log\frac {q(x)} {p(x)} \, dx$

Although we may see $H(...) = ...$ in the literature, with no H'(...) on the right hand side of the equation, it is not technically accurate. In such cases there is always some implied entropy to which the entropy on the left hand side is relative.

— FauChristian
kaynak