0
我使用Mallet樸素貝葉斯算法對大數據集進行分類。我的問題是如何將我的數據集分割成火車和測試塊? 任何人都可以告訴我火車測試拆分的最佳方法嗎? 我的文檔按日期排序。 我發現列車測試分裂這個方法:Train-Test Split +文本分類+樸素貝葉斯
public Trial testTrainSplit(InstanceList instances) {
int TRAINING = 0;
int TESTING = 1;
int VALIDATION = 2;
// Split the input list into training (90%) and testing (10%) lists.
// The division takes place by creating a copy of the list,
// randomly shuffling the copy, and then allocating
// instances to each sub-list based on the provided proportions.
InstanceList[] instanceLists =
instances.split(new Randoms(),
new double[] {0.9, 0.1, 0.0});
// The third position is for the "validation" set,
// which is a set of instances not used directly
// for training, but available for determining
// when to stop training and for estimating optimal
// settings of nuisance parameters.
// Most Mallet ClassifierTrainers can not currently take advantage
// of validation sets.
Classifier classifier = trainClassifier(instanceLists[TRAINING]);
return new Trial(classifier, instanceLists[TESTING]);
}
,但我認爲這是不恰當的,其中的文件按日期排序的情況。 任何人都可以幫助我嗎?