2012-11-13 43 views
5
function [ d ] = hcompare_KL(h1,h2) 
%This routine evaluates the Kullback-Leibler (KL) distance between histograms. 
%    Input:  h1, h2 - histograms 
%    Output: d – the distance between the histograms. 
%    Method: KL is defined as: 
%    Note, KL is not symmetric, so compute both sides. 
%    Take care not to divide by zero or log zero: disregard entries of the sum  for which with H2(i) == 0. 

temp = sum(h1 .* log(h1 ./ h2)); 
temp(isinf(temp)) = 0; % this resloves where h1(i) == 0 
d1 = sum(temp); 

temp = sum(h2 .* log(h2 ./ h1)); % other direction of compare since it's not symetric 
temp(isinf(temp)) = 0; 
d2 = sum(temp); 

d = d1 + d2; 

end 

我的問題是,無論何時h1(i)或h2(i)== 0我得到inf,這是預期的。然而,在KL距離我假設返回0,無論他們h1或h2 == 0我怎麼能做到這一點,而不使用循環?Kullback-Leibler(KL)直方圖之間的距離 - matlab

+1

這實在是很難幫助你,如果你不問更好的問題。如果我不知道程序應該做什麼,我也找不到你的錯誤。請建議一個示例輸入,告訴我們您期望的輸出以及出錯的地方。該函數是否會拋出錯誤?函數不能返回你想要的東西嗎?我已經低估了你的問題,但如果問題得到改善,我很樂意修改我的投票。 – Jonas

+0

嗨@Jonas感謝您的每日答案,因爲您可以看到我在學習。讓我在後面指定我的問題,對不起,謝謝 – Gilad

+0

@jonas我編輯了我的問題你可以看看它,讓我們假設我們有h1 = [0:9]和h2 = [1:10]作爲輸入,我會得到一個錯誤,當我有0作爲輸入..日誌(0) – Gilad

回答

3

要避免的問題時,任何計數爲0,我建議你創建一個標誌着「好」的數據點的指數:

%# you may want to do some input testing, such as whether h1 and h2 are 
%# of the same size 

%# preassign the output 
d = zeros(size(h1)); 

%# create an index of the "good" data points 
goodIdx = h1>0 & h2>0; %# bin counts <0 are not good, either 

d1 = sum(h1(goodIdx) .* log(h1(goodIdx) . /h2(goodIdx))); 
d2 = sum(h2(goodIdx) .* log(h2(goodIdx) . /h1(goodIdx))); 

%# overwrite d only where we have actual data 
%# the rest remains zero 
d(goodIdx) = d1 + d2; 
+0

是正是我所做的我用h1(找(h1 == 0))= 1,謝謝 – Gilad

2

我看到一些錯在您的實現。請編輯日誌通過的log 2

1

嘗試使用

d=sum(h1.*log2(h1+eps)-h1.*log2(h2+eps)) 

注意,KL(H1,H2)是KL(H2,H1)不同。你的情況是KL(h1,h2),對吧? 我認爲你的實現是錯誤的。這不是h1和h2之間的距離。 H1和H2之間的KL距離定義

KL(h1,h2)=sum(h1.log(h1/h2))=sum(h1.logh1-h2.logh2). 

所以正確的實現必須

d=sum(h1.*log2(h1+eps)-h1.*log2(h2+eps)) %KL(h1,h2) 

d=sum(h2.*log2(h2+eps)-h2.*log2(h1+eps)) %KL(h2,h1)