2014-07-17 20 views
9

我已經用R符號訓練了樹模型。現在我想產生混淆矩陣和不斷收到以下錯誤:產生混淆矩陣時,會發生ConfusionMatrix中的錯誤數據和參考因子必須具有相同的層數R CARET

Error in confusionMatrix.default(predictionsTree, testdata$catgeory) : the data and reference factors must have the same number of levels

prob <- 0.5 #Specify class split 
singleSplit <- createDataPartition(modellingData2$category, p=prob, 
            times=1, list=FALSE) 
cvControl <- trainControl(method="repeatedcv", number=10, repeats=5) 
traindata <- modellingData2[singleSplit,] 
testdata <- modellingData2[-singleSplit,] 
treeFit <- train(traindata$category~., data=traindata, 
       trControl=cvControl, method="rpart", tuneLength=10) 
predictionsTree <- predict(treeFit, testdata) 
confusionMatrix(predictionsTree, testdata$catgeory) 

錯誤。兩個對象的級別相同。我無法弄清楚問題所在。他們的結構和水平如下。 他們應該是一樣的。任何幫助將不勝感激,因爲它使我破解!

> str(predictionsTree) 
Factor w/ 30 levels "16-Merchant Service Charge",..: 28 22 22 22 22 6 6 6 6 6 ... 
> str(testdata$category) 
Factor w/ 30 levels "16-Merchant Service Charge",..: 30 30 7 7 7 7 7 30 7 7 ... 

> levels(predictionsTree) 
[1] "16-Merchant Service Charge" "17-Unpaid Cheque Fee"   "18-Gov. Stamp Duty"   "Misc"       "26-Standard Transfer Charge" 
[6] "29-Bank Giro Credit"   "3-Cheques Debit"    "32-Standing Order - Debit" "33-Inter Branch Payment"  "34-International"    
[11] "35-Point of Sale"    "39-Direct Debits Received" "4-Notified Bank Fees"   "40-Cash Lodged"    "42-International Receipts" 
[16] "46-Direct Debits Paid"  "56-Credit Card Receipts"  "57-Inter Branch"    "58-Unpaid Items"    "59-Inter Company Transfers" 
[21] "6-Notified Interest Credited" "61-Domestic"     "64-Charge Refund"    "66-Inter Company Transfers" "67-Suppliers"     
[26] "68-Payroll"     "69-Domestic"     "73-Credit Card Payments"  "82-CHAPS Fee"     "Uncategorised" 

> levels(testdata$category) 
[1] "16-Merchant Service Charge" "17-Unpaid Cheque Fee"   "18-Gov. Stamp Duty"   "Misc"       "26-Standard Transfer Charge" 
[6] "29-Bank Giro Credit"   "3-Cheques Debit"    "32-Standing Order - Debit" "33-Inter Branch Payment"  "34-International"    
[11] "35-Point of Sale"    "39-Direct Debits Received" "4-Notified Bank Fees"   "40-Cash Lodged"    "42-International Receipts" 
[16] "46-Direct Debits Paid"  "56-Credit Card Receipts"  "57-Inter Branch"    "58-Unpaid Items"    "59-Inter Company Transfers" 
[21] "6-Notified Interest Credited" "61-Domestic"     "64-Charge Refund"    "66-Inter Company Transfers" "67-Suppliers"     
[26] "68-Payroll"     "69-Domestic"     "73-Credit Card Payments"  "82-CHAPS Fee"     "Uncategorised"  
+0

在你的錯誤中,'category'拼寫爲'catgeory'。如果問題不相關,那麼'identical(levels(predictionsTree),levels(testdata $ category))'的輸出是什麼? – fxi

+0

嗨,謝謝你,我讚揚愚蠢的拼寫錯誤.... doh!我運行了相同的功能,它輸出[1] TRUE .........現在我遇到以下錯誤,當我運行confusionMatrix函數.....表中的錯誤(數據,參考,dnn = dnn,...): 所有參數必須具有相同的長度 – user2987739

+0

檢查另一個拼寫錯誤的'catgeory',檢查'length(testdata $ category)'和'length(predictionsTree'),並檢查兩個向量的總結。只需要一個簡單的混淆矩陣:'table(predictionsTree,testdata $ category)' – fxi

回答

1

也許你的模型沒有預測到某個因素。 使用table()函數而不是confusionMatrix()來查看是否有問題。

+1

您可以將其添加爲註釋。 –

-2

可能是測試數據中缺少值,請在「predictionsTree < - predict(treeFit,testdata)」之前添加以下行以刪除NA。我有同樣的錯誤,現在它適用於我。

testdata <- testdata[complete.cases(testdata),] 
0

你正在運行到長度問題可能是由於到NAS的訓練集中存在 - 要麼丟棄不完整的情況下,或歸罪於讓你沒有缺失值。

0

嘗試指定na.passna.action選項:

predictionsTree <- predict(treeFit, testdata,na.action = na.pass) 
0

我有同樣的問題,而是繼續和讀取,像這樣的數據文件後,改變了它..

data = na.omit(data)

感謝所有爲指針!

相關問題