2015-04-04 48 views
0

您好,那裏是一個Matlab大師!用於決策樹中熵的Matlab函數

我在一個月前的某個地方開始學習MATLAB(在我的試用許可證過期後,我切換到了八度)。我正在寫一個函數(僅僅爲了教育需要)來計算熵(例如在決策樹的葉子中),而且我被卡住了。我得到以下錯誤:

>> my_entropy(cell3, false) 
f = -0 
f = 

    -0 -0 

f = 

    -0 -0 3 

error: my_entropy: A(I,J): column index out of bounds; value 2 out of bound 1 
error: called from: 
error: C:\big data\octave\my_entropy.m at line 29, column 13 

更新15年5月4日至@Daniel建議

# The main difference between MATLAB bundled entropy function 
# and this custom function is that they use a transformation to uint8 
# and the bundled entropy() function is used mostly for signal processing 
# while I simply use a straightforward solution usefull e.g. for learning trees 

function f = my_entropy(data, weighted) 
    # function accepts only cell arrays; 
    # weighted tells whether return one weighed average entropy 
    # or return a vector of entropies per bucket 
    # moreover, I find vectors as the only representation of "buckets" 
    # in other words, vector = bucket (leaf of decision tree) 
    if nargin < 2 
    weighted = true; 
    end; 

    rows = @(x) size(x,1); 
    cols = @(x) size(x,2); 

    if weighted 
    f = 0; 
    else 
    f = []; 
    end; 

    for r = 1:rows(data) 

    for c = 1:cols(data{r}) # in most cases this will be 1:1 

     omega = sum(data{r,c}); 
     epsilon = 0; 

     for b = 1:cols(data{r,c}) 
     epsilon = epsilon + ((data{r,c}(b)/omega) * (log2(data{r,c}(b)/omega))); 
     end; 

     entropy = -epsilon; 

     if weighted 
     f = f + entropy 
     else 
     f = [f entropy] 
     end; 

    end; 

    end; 

end; 

# test cases 

cell1 = { [16];[16];[2 2 2 2 2 2 2 2];[12];[16] } 
cell2 = { [16],[12];[16],[2];[2 2 2 2 2 2 2 2],[8 8];[12],[8 8];[16],[8 8] } 
cell3 = { [16];[16];[2 2 2 2 2 2 2 2];[12];[16] } 

對於輸入

c = { [16];[16];[2 2 2 2 2 2 2 2];[12];[16] } 

my_entropy的答案(C,FALSE) 應

[0, 0, 3, 0, 0] 

本圖可以幫助顯現

Marbles as data

一個桶是一個MATLAB矢量,整個PALET是一個matlab細胞片, 號碼是不同的各種數據的計數。因此,在這張圖片中,中間單元{2,2}具有熵3,而其他桶(單元)具有熵0.

幫助建議如何修復它是值得讚賞的, 最好的問候! :)

回答

0

的錯誤是在這裏for c = 1:cols(cell{r})

你想細胞的cols的數量,這是cols(cell)。您所寫的內容會返回單元格第r個元素的列數。

您應該避免使用變量名這是等於在函數構建像cell

+0

嗯。這就是我正在做的事情,對於我的每一個單元格來說,單元格都處理這個單元格中的數據(1個矢量)。至於命名約定,我相應地更新了代碼。 – oski86 2015-04-05 11:19:28

+1

用'for c = 1:cols(cell)'嘗試一下,我得到了預期的結果。 – Daniel 2015-04-05 12:53:28