2013-03-21 55 views
1

我試圖將我的分類器結果從分類實例轉換爲0或1轉換爲分數(置信度?),例如0和10之間, 我正在使用RIDOR分類器,但也可以使用ClassificationViaRegression,RandomForest或AttributeSelectedClassifier,雖然它們分類不太好。將Weka分類器轉換爲分數

我到終端(所有的選項選中)輸出盡我所能,但我不能在任何地方的預言找到了信心的措施。另外我明白這些都沒有選擇輸出源代碼?在這種情況下,我將不得不手動編碼分類器。

這裏是產生規則的例子:

class = 2 (40536.0/20268.0) 
     Except (fog <= 14.115114) and (polySyllabicWords/Sentence <= 1.973684) and (polySyllabicWords/Sentence <= 1.245) and (Characters/Word > 4.331715) => class = 1 (2309.0/5.0) [1137.0/4.0] 
     Except (fog <= 14.115598) and (polySyllabicWords/Sentence <= 1.973684) and (polySyllabicWords/Sentence > 1.514706) => class = 1 (2281.0/0.0) [1112.0/0.0] 
     Except (fog <= 14.136126) and (Words/Sentence > 19.651515) and (polySyllableCount <= 10.5) and (polySyllabicWords/Sentence > 2.416667) and (Syllables/Sentence <= 34.875) => class = 1 (601.0/0.0) [303.0/6.0] 
     Except (fog <= 14.140863) and (polySyllabicWords/Sentence <= 1.944444) and (polySyllableCount <= 4.5) and (polySyllabicWords/Sentence <= 1.416667) and (wordCount > 29.5) and (Characters/Word <= 4.83156) => class = 1 (333.0/0.0) [152.0/0.0] 
     Except (fog <= 14.142217) and (polySyllabicWords/Sentence <= 1.944444) and (polySyllableCount <= 4.5) and (polySyllabicWords/Sentence <= 1.416667) and (numOfChars > 30.5) and (Syllables/Word <= 1.474937) => class = 1 (322.0/0.0) [174.0/4.0] 
     Except (fog <= 14.140863) and (polySyllabicWords/Sentence <= 1.75) and (polySyllableCount <= 4.5) => class = 1 (580.0/28.0) [298.0/21.0] 
     Except (fog <= 14.141508) and (Syllables/Sentence > 25.585714) and (Words/Sentence > 19.683333) and (sentenceCount <= 4.5) and (polySyllabicWords/Sentence <= 2.291667) and (fog > 12.269468) => class = 1 (434.0/0.0) [202.0/0.0] 
     Except (fog <= 14.140863) and (Syllables/Sentence > 25.866071) and (polySyllableCount <= 16.5) and (fog > 12.793102) and (polySyllabicWords/Sentence <= 2.9) and (wordCount <= 59.5) and (Words/Sentence > 16.166667) and (Words/Sentence <= 24.75) => class = 1 (291.0/0.0) [166.0/0.0] 
     Except (fog <= 14.140863) and (Syllables/Sentence > 25.585714) and (Words/Sentence > 19.630682) and (polySyllabicWords/Sentence > 2.656863) and (polySyllableCount <= 16.5) and (fog > 13.560337) and (Words/Sentence <= 21.55) and (numOfChars <= 523) => class = 1 (209.0/0.0) [93.0/2.0] 
     Except (fog <= 14.147578) and (Syllables/Word <= 1.649029) and (polySyllabicWords/Sentence <= 1.75) and (polySyllabicWords/Sentence > 1.303846) and (polySyllabicWords/Sentence <= 1.422619) and (fog > 9.327132) => class = 1 (183.0/0.0) [64.0/0.0]...... 

我也不能確定第一行指(二萬零三百六十八分之四萬零五百三十六) - 這是否只是意味着把它歸類爲2,除非下列規則之一應用?

任何幫助非常感謝!

+1

是,第一行表示的默認分類應該是(2),除非下面的規則之一是真實的。 – etov 2013-03-21 14:24:31

回答

1

一般來說,從分類獲得的信心不被視爲一件容易的事,特別是如果你想它校準(例如表現爲分類是正確的機會)。但是,有幾種相對簡單的方法可以獲得粗略估計。

隨着樹和基於規則的分類,括號中的數字表示包含在桶正確/不正確的樣本數量。因此,舉例來說,具有(20,2)的桶意味着在該規則正確的情況下有20個情況,並且基於列車數據,有2個情況是不正確的。你可以用這個比例作爲粗略的信心度量。

當使用的迴歸,你可以得到WEKA輸出分類器(而不僅僅是類)和基礎上對它的信任措施的實際數字結果。

更一般地,下面的文檔,你可以使用稱道線的-p選項(參見here)。但是,我不確定這些數字是如何計算的。