2016-01-28 79 views
0

使用Matlab時,從交叉驗證擬閤中找到誤差最小的模型的正確方法是什麼?我的目標是展現最好的,交叉驗證決策樹的錯誤率作爲測試數據的大小的功能,並有下面的代碼:在Matlab中交叉驗證返回最佳決策樹

chess = csvread(filename); 
predictors = chess(:,1:6); 
class = chess(:,7); 

cvpart = cvpartition(class,'holdout', 0.3); 
Xtrain = predictors(training(cvpart),:); 
Ytrain = class(training(cvpart),:); 
Xtest = predictors(test(cvpart),:); 
Ytest = class(test(cvpart),:); 

numElements = numel(training(cvpart)); 
trainErrorGrowing = zeros(numElements,1); 
testErrorGrowing = zeros(numElements,1); 

for n = 100:numElements 
    data = datasample(training(cvpart), n); 
    dataX = predictors(data,:); 
    dataY = class(data,:); 

    % Fit the decision tree 
    tree = fitctree(dataX, dataY, 'AlgorithmForCategorical', 'PullLeft', 'CrossVal', 'on'); 

    % Loop to find the model with the least error 
    kfoldError = 100; 
    bestTree = tree.Trained{1}; 
    for i = 1:10 
     err = loss(tree.Trained{i}, Xtrain, Ytrain); 
     if err < kfoldError 
      kfoldError = err; 
      bestTree = tree.Trained{i}; 
     end 
    end 
    trainErrorGrowing(n) = loss(bestTree,Xtest,Ytest,'Subtrees','all'); % Training Error 
    testErrorGrowing(n) = loss(bestTree,Xtest,Ytest,'Subtrees','all'); % Testing Error 
end 

plot(numElements,testErrorGrowing); 

這是與用於數據的指標很重要不能以任何方式使用最終測試來訓練樹。然而,當我嘗試執行這段代碼,我上線

err = loss(tree.Trained{i}, Xtrain, Ytrain); 

我試圖鑄造在INT8和炭迭代器錯誤

Error using classreg.learning.internal.classCount 
You passed an unknown class '1' of type double. 

,但都收到同樣的錯誤倍。是否有一種更簡單的方法來查找出錯結果最小的決策樹,或者至少有一種方法可以引用單個受過訓練的樹?

回答

0

假設您在學習模型時進行10倍交叉驗證。然後,您可以使用kfoldLoss功能也得到每個倍CV的損失,然後選擇訓練的模型,讓您通過以下方式將至少CV損失:

modelLosses = kfoldLoss(tree,'mode','individual'); 

上面的代碼將會給你的向量如果您在學習期間完成了10倍交叉驗證,則長度爲10(10個CV錯誤值)。假設具有最小CV誤差的經過訓練的模型是第k個,那麼您將使用:

testSetPredictions = predict(tree.Trained{k}, testSetFeatures);