GMM之間的KL分歧沒有封閉的形式。不過,你可以很容易地做蒙特卡羅。回想一下KL(p||q) = \int p(x) log(p(x)/q(x)) dx = E_p[ log(p(x)/q(x))
。所以:
def gmm_kl(gmm_p, gmm_q, n_samples=10**5):
X = gmm_p.sample(n_samples)
log_p_X, _ = gmm_p.score_samples(X)
log_q_X, _ = gmm_q.score_samples(X)
return log_p_X.mean() - log_q_X.mean()
(mean(log(p(x)/q(x))) = mean(log(p(x)) - log(q(x))) = mean(log(p(x))) - mean(log(q(x)))
是便宜一點的計算。)
你不想使用scipy.stats.entropy
;這是針對離散分佈的。
如果你想在對稱和平滑Jensen-Shannon divergenceKL(p||(p+q)/2) + KL(q||(p+q)/2)
相反,它是非常相似:
def gmm_js(gmm_p, gmm_q, n_samples=10**5):
X = gmm_p.sample(n_samples)
log_p_X, _ = gmm_p.score_samples(X)
log_q_X, _ = gmm_q.score_samples(X)
log_mix_X = np.logaddexp(log_p_X, log_q_X)
Y = gmm_q.sample(n_samples)
log_p_Y, _ = gmm_p.score_samples(Y)
log_q_Y, _ = gmm_q.score_samples(Y)
log_mix_Y = np.logaddexp(log_p_Y, log_q_Y)
return (log_p_X.mean() - (log_mix_X.mean() - np.log(2))
+ log_q_Y.mean() - (log_mix_Y.mean() - np.log(2)))/2
(log_mix_X
/log_mix_Y
實際上是兩倍的混合密度的日誌;拉動,走出平均手術的節省一些觸發器)
封閉表格不存在。看看這篇文章來得到它的近似值。 http://scholar.google.co.kr/scholar?cluster=17600982039879101400&hl=ko&as_sdt=0,5&authuser=1 – emeth 2014-09-28 08:10:30