KFold交叉驗證的R中KNN文本分類

我創建了一個文本分類器使用進行分類註釋成各種類別，如KFold交叉驗證的R中KNN文本分類

 Comment       Category 
Good Service provided     Service 
Excellent Communication     Communication

我已經做了分類：

knn(modeldata[train, ], modeldata[test,] , cl[train], k =2, use.all = TRUE)

現在我想使用K-Fold Cross Validation評估此模型。我期待一些，我可以用它來知道如果模型過擬合或欠擬合等

我用

knn.cv(modeldata[train, ], cl[train], k =2, use.all = TRUE)

但這個命令的幫助，表示將返回NA如果模型是困惑。請指導

來源

2016-10-17 Sourabh

您爲knn使用哪個軟件包？您可以使用插入符，對於CV如下內容（例如與虹膜數據集）：

training <- iris 
ctrl <- trainControl(method="repeatedcv",repeats = 3) 
knnFit <- train(Species ~ ., data = training, method = "knn", 
       trControl = ctrl, preProcess = c("center","scale")) 
knnFit

與輸出

k-Nearest Neighbors 

150 samples 
    4 predictor 
    3 classes: 'setosa', 'versicolor', 'virginica' 

Pre-processing: centered (4), scaled (4) 
Resampling: Cross-Validated (10 fold, repeated 3 times) 
Summary of sample sizes: 135, 135, 135, 135, 135, 135, ... 
Resampling results across tuning parameters: 

    k Accuracy Kappa  
    5 0.9511111 0.9266667 
    7 0.9577778 0.9366667 
    9 0.9533333 0.9300000 

Accuracy was used to select the optimal model using the largest value. 
The final value used for the model was k = 7.

來源

2016-10-17 08:01:13

我使用「類」包KNN。我不確定是否因爲虹膜數據集中的列數，上述代碼適用於虹膜數據集，但不適用於我的數據集（只有兩列）。當我運行上面的命令時，我得到下面的消息：在preProcess.default中的警告（thresh = 0.95，k = 5，method = c（「center」，：這些變量有零差異： – Sourabh

我也試過下面的語句，錯誤消息：。knnFit1 < - train（Category_Text，data = x， method =「knn」， preProcess = NULL， trControl = trainControl（method =「cv」，number = 5，。classProbs = FALSE））。錯誤消息：結果中的一個或多個因素級別沒有數據。查看所有因素，但未找到任何空白/空白級別 – Sourabh

KFold交叉驗證的R中KNN文本分類

回答

相關問題