我想添加更多的實例到我的訓練集並執行10倍交叉驗證。IndexOutOfBoundsException當試圖添加更多的實例使用Weka的訓練集
我的實例採用字符串格式,因此我使用StringToWordVector過濾器將它們轉換爲數字。如果我不添加額外的網頁我想要的東西工作得很好。但是,當我添加命令trainSet.addAll(data2);
並通過trainSet
到過濾器我得到第一次迭代中一個奇怪的IndexOutOfBoundsException
在Instances fTrainSet = Filter.useFilter(trainSet, filter);
Instances data = getDataFromFile("pathtofile.arff");//main dataset 1821 instances
Instances data2 = getDataFromFile("anotherpath.arff");//709 instances i want to add
int folds = 10;
for(int i=0;i<folds;i++){
Instances trainSet = data.trainCV(folds, i);//training set
System.out.println(trainSet.numInstances());//Prints 1638
Instances testSet = data.testCV(folds, i);//testing set
//add more instances
trainSet.addAll(data2);
System.out.println(trainSet.numInstances());//Prints 2347
//filter
StringToWordVector filter = new StringToWordVector();
filter.setInputFormat(trainSet);
filter.setWordsToKeep(10000);
filter.setTFTransform(true);
filter.setLowerCaseTokens(true);
filter.setOutputWordCounts(true);
Stemmer stemmer = new IteratedLovinsStemmer();
filter.setStemmer(stemmer);
WordsFromFile stopwords = new WordsFromFile();
stopwords.setStopwords(new File(".data/stopwords2.txt"));
filter.setStopwordsHandler(stopwords);
Instances fTrainSet = Filter.useFilter(trainSet, filter);//error!!!
Instances fTestSet = Filter.useFilter(testSet, filter);
....
//classification and evaluation....
我得到當我試圖使用過濾器出現以下錯誤:
Exception in thread "main" java.lang.IndexOutOfBoundsException: Index: 2161, Size: 1749
at java.util.ArrayList.rangeCheck(Unknown Source)
at java.util.ArrayList.get(Unknown Source)
at weka.core.Attribute.addStringValue(Attribute.java:924)
at weka.core.StringLocator.copyStringValues(StringLocator.java:150)
at weka.core.StringLocator.copyStringValues(StringLocator.java:91)
at weka.filters.Filter.copyValues(Filter.java:399)
at weka.filters.Filter.bufferInput(Filter.java:342)
at weka.filters.unsupervised.attribute.StringToWordVector.input(StringToWordVector.java:655)
at weka.filters.Filter.useFilter(Filter.java:692)
at CrossValidationExample.main(CrossValidationExample.java:108)
什麼可能是錯誤的?
我們不知道「摺疊」是什麼,它是最重要的,因爲它是循環中'i'的上限。請提供更多代碼。 – xenteros
folds = 10編輯 – xro7
你希望'folds = 10編輯'。如果你得到'ArrayOutOfBoundsException'它必須是有索引的東西。有些東西出錯了。這個變量是其中一個嫌疑人,所以請給我們提供更多的代碼。 – xenteros