2013-10-24 121 views
1

I將這些數據作爲訓練集和屬性PlayTennise作爲目標。在weka上使用j48進行分類

@relation Weka 

@attribute Day {D1,D2,D3,D4,D5,D6,D7,D8,D9,D10,D11,D12,D13,D14} 
@attribute Outlook {Sunny,Overcast,Rain} 
@attribute Temperature {Hot,Mild,Cool} 
@attribute Humidity {High,Normal} 
@attribute Wind {Weak,Strong} 
@attribute PlayTennis {No,Yes} 

@data 
D1,Sunny,Hot,High,Weak,No 
D2,Sunny,Hot,High,Strong,No 
D3,Overcast,Hot,High,Weak,Yes 
D4,Rain,Mild,High,Weak,Yes 
D5,Rain,Cool,Normal,Weak,Yes 
D6,Rain,Cool,Normal,Strong,No 
D7,Overcast,Cool,Normal,Strong,Yes 
D8,Sunny,Mild,High,Weak,No 
D9,Sunny,Cool,Normal,Weak,Yes 
D10,Rain,Mild,Normal,Weak,Yes 
D11,Sunny,Mild,Normal,Strong,Yes 
D12,Overcast,Mild,High,Strong,Yes 
D13,Overcast,Hot,Normal,Weak,Yes 
D14,Rain,Mild,High,Strong,No 

另外,我給weka提供測試集的數據,但只是將目標[是,否]轉換爲'?'。 使得:

@relation Weka2 

@attribute Day {D1,D2,D3,D4,D5,D6,D7,D8,D9,D10,D11,D12,D13,D14} 
@attribute Outlook {Sunny,Overcast,Rain} 
@attribute Temperature {Hot,Mild,Cool} 
@attribute Humidity {High,Normal} 
@attribute Wind {Weak,Strong} 
@attribute PlayTennis {No,Yes} 

@data 
D1,Sunny,Hot,High,Weak,? 
D2,Sunny,Hot,High,Strong,? 
D3,Overcast,Hot,High,Weak,? 
D4,Rain,Mild,High,Weak,? 
D5,Rain,Cool,Normal,Weak,? 
D6,Rain,Cool,Normal,Strong,? 
D7,Overcast,Cool,Normal,Strong,? 
D8,Sunny,Mild,High,Weak,? 
D9,Sunny,Cool,Normal,Weak,? 
D10,Rain,Mild,Normal,Weak,? 
D11,Sunny,Mild,Normal,Strong,? 
D12,Overcast,Mild,High,Strong,? 
D13,Overcast,Hot,Normal,Weak,? 
D14,Rain,Mild,High,Strong,? 

點擊開始,但結果曾這樣說:

=== Run information === 

Scheme:  weka.classifiers.trees.J48 -C 0.25 -M 2 
Relation:  Weka 
Instances: 14 
Attributes: 6 
       Day 
       Outlook 
       Temperature 
       Humidity 
       Wind 
       PlayTennis 
Test mode: user supplied test set: size unknown  (reading incrementally) 

=== Classifier model (full training set) === 

J48 pruned tree 
------------------ 

Outlook = Sunny 
| Humidity = High: No (3.0) 
| Humidity = Normal: Yes (2.0) 
Outlook = Overcast: Yes (4.0) 
Outlook = Rain 
| Wind = Weak: Yes (3.0) 
| Wind = Strong: No (2.0) 

Number of Leaves :  5 

Size of the tree : 8 


Time taken to build model: 0 seconds 

=== Evaluation on test set === 

Time taken to test model on supplied test set: 0 seconds 

=== Summary === 

Total Number of Instances    0  
Ignored Class Unknown Instances     7  

=== Detailed Accuracy By Class === 

       TP Rate FP Rate Precision Recall F-Measure MCC  ROC Area PRC Area Class 
       0.000 0.000 0.000  0.000 0.000  0.000 ?   ?   No 
       0.000 0.000 0.000  0.000 0.000  0.000 ?   ?   Yes 
Weighted Avg. NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  

=== Confusion Matrix === 

a b <-- classified as 
0 0 | a = No 
0 0 | b = Yes 

它說,有「忽略類未知實例= 14」和「總實例數= 0」

我不明白我該做什麼?

請幫幫我嗎?

回答

1

測試數據集應保留爲標記爲「是」或「否」的目標變量。

這將允許Weka評估它的預測質量。如果沒有目標標籤,Weka不知道預測是否正確,因此它在評估中忽略了這些情況。

如果您只是對預測感興趣,您仍然可以使用未標記的數據。

例如,如果使用GUI:

  1. 裝入的訓練數據,並選擇分類標籤。
  2. 按「測試選項」框中的「更多選項」按鈕。
  3. 現在勾選「輸出預測」旁邊的複選標記。
  4. 供應的未標記的測試數據並按下開始按鈕

這產生ouptut與看似忽略實例的預測(下面是相關輸出的一個示例)。

=== Predictions on test split === 
inst#, actual, predicted, error, probability distribution 
    1   ?  2:no  + 0  *1  
    2   ?  2:no  + 0  *1  
    3   ?  1:yes  + *1  0  
    4   ?  1:yes  + *1  0  
    5   ?  1:yes  + *1  0  
    6   ?  2:no  + 0  *1  
    7   ?  1:yes  + *1  0  

+0

坦克,我做了,我設置了「輸出預測」,但「未知實例= [所有實例]」存在。每個實例的預測誤差等於1 –

+0

您將目標值放回到測試數據集中,但評估仍然沒有意義? – Walter

+0

我將確切的訓練數據文件複製到新文件中,只需將目標屬性yes或no更改爲'?'即可。但它表示所有實例都屬於「類未知實例」。 –