2014-12-02 117 views
9

我試了好幾個小時來計算熵,我知道我錯過了一些東西。希望這裏有人能給我一個主意!計算熵

編輯:我想我的公式是錯誤的!

CODE:

info <- function(CLASS.FREQ){ 
     freq.class <- CLASS.FREQ 
     info <- 0 
     for(i in 1:length(freq.class)){ 
     if(freq.class[[i]] != 0){ # zero check in class 
      entropy <- -sum(freq.class[[i]] * log2(freq.class[[i]])) #I calculate the entropy for each class i here 
     }else{ 
      entropy <- 0 
     } 
     info <- info + entropy # sum up entropy from all classes 
     } 
     return(info) 
    } 

我希望我的帖子是明確的,因爲它是我第一次張貼在這裏。

這是我的數據集:

buys <- c("no", "no", "yes", "yes", "yes", "no", "yes", "no", "yes", "yes", "yes", "yes", "yes", "no") 

credit <- c("fair", "excellent", "fair", "fair", "fair", "excellent", "excellent", "fair", "fair", "fair", "excellent", "excellent", "fair", "excellent") 

student <- c("no", "no", "no","no", "yes", "yes", "yes", "no", "yes", "yes", "yes", "no", "yes", "no") 

income <- c("high", "high", "high", "medium", "low", "low", "low", "medium", "low", "medium", "medium", "medium", "high", "medium") 

age <- c(25, 27, 35, 41, 48, 42, 36, 29, 26, 45, 23, 33, 37, 44) # we change the age from categorical to numeric 
+1

具有諷刺意味的是,當然計算越差越接近答案。 – Strawberry 2014-12-02 16:58:51

+0

發佈(a)您認爲正確的公式是很好的,以及(b)您將要提供給此功能的數據類型的示例。使用'dput()'是共享數據的好方法。 – Gregor 2014-12-02 17:01:20

+0

你期望什麼答案?你的代碼運行沒有錯誤,並正確計算香農熵。 – cdeterman 2014-12-02 17:20:34

回答

14

最終我找到代碼中沒有錯誤,因爲它運行沒有錯誤。我認爲你缺少的部分是班級頻率的計算,你會得到你的答案。快速瀏覽您提供的不同對象,我懷疑您正在查看buys

buys <- c("no", "no", "yes", "yes", "yes", "no", "yes", "no", "yes", "yes", "yes", "yes", "yes", "no") 
freqs <- table(buys)/length(buys) 
info(freqs) 
[1] 0.940286 

作爲提高你的代碼的問題,您可以極大地簡化這個,你不需要一個循環,如果您提供一流的頻率的載體。

例如:

# calculate shannon-entropy 
-sum(freqs * log2(freqs)) 
[1] 0.940286 

作爲一個側面說明,功能entropy.empiricalentropy包你設置單位的log 2允許一些更多的靈活性。例如:

entropy.empirical(freqs, unit="log2") 
[1] 0.940286 
+0

謝謝,你的回答幫助我理解了這一點。 – Codex 2014-12-02 17:52:36

+0

@Codex,很高興幫助。關於你最後的評論,我只是複製你的對象,並在每一個上面做了計算,找到正確的。如果這是滿意的,請接受答案。 – cdeterman 2014-12-02 17:54:11