1
問:是否有一個隨機森林示例將火車和測試集分開?我在Accord-Net ML測試項目中找到的當前示例使用了相同的數據進行培訓和測試。如何在Accord.Net中同步火車和測試代碼簿
顯然我遇到的問題是同步測試和訓練集中生成的標籤(整數)。我生成列車標籤用作例如:
int[] trainOutputs = trainCodebook.Translate("Output", trainLabels);
And the test labels similarly:
int[] testOutputs = testCodebook.Translate("Output", testLabels);
Finally I train with the train data and test with the test data:
var forest = teacher.Learn(trainVectors, trainOutputs);
int[] predicted = forest.Decide(testVectors);
除非前三行是在列車和測試都相同的設定標記是不同的,並且它相應地它產生一個非常高的誤差率。
我試圖簡單地手動創建我的碼本三元字符串:
new Codification("-1","0","1");
不幸的是這將產生一個運行時錯誤,指出給定的關鍵是不是在字典。我確信有一種方法可以在兩個獨立的碼本中同步密鑰生成。我可以使它與下面的代碼一起工作如果我將我的列車數據的三行(包含所有三個鍵)添加到測試數據的頂部。不是我的首選解決方案; =)
這裏是我運行整個測試:
[Test]
public void test_learn()
{
Accord.Math.Random.Generator.Seed = 1;
/////////// TRAINING SET ///////////
// First, let's load the TRAINING set into an array of text that we can process
string[][] text = Resources.train.Split(new[] { "\r\n" },
StringSplitOptions.RemoveEmptyEntries).Apply(x => x.Split(','));
int length = text[0].Length;
List<int> columns = new List<int>();
for (int i = 1; i < length; i++)
{
columns.Add(i);
}
double[][] trainVectors = text.GetColumns(columns.ToArray()).To<double[][]>();
// The first column contains the expected ternary category (i.e. -1, 0, or 1)
string[] trainLabels = text.GetColumn(0);
var trainCodebook = new Codification("Output", trainLabels);
int[] trainOutputs = trainCodebook.Translate("Output", trainLabels);
////////// TEST SET ////////////
text = Resources.test.Split(new[] { "\r\n" },
StringSplitOptions.RemoveEmptyEntries).Apply(x => x.Split(','));
double[][] testVectors = text.GetColumns(columns.ToArray()).To<double[][]>();
string[] testLabels = text.GetColumn(0);
var testCodebook = new Codification("Output", testLabels);
int[] testOutputs = testCodebook.Translate("Output", testLabels);
var teacher = new RandomForestLearning()
{
NumberOfTrees = 10,
};
var forest = teacher.Learn(trainVectors, trainOutputs);
int[] predicted = forest.Decide(testVectors);
int lineNum = 1;
foreach (int prediction in predicted)
{
Console.WriteLine("Prediction " + lineNum + ": "
+ trainCodebook.Translate("Output", prediction));
lineNum++;
}
// I'm using the test vectors to calculate the error rate
double error = new ZeroOneLoss(testOutputs).Loss(forest.Decide(testVectors));
Console.WriteLine("Error term is " + error);
Assert.IsTrue(error < 0.20); // humble expectations ;-)
}
你要真有**只有一個**碼本從訓練集創建的,你應該用它來在訓練前處理數據和* *測試集。 – Cesar