0
我前面用cross_validation.train_test_split到我的數據集拆分成一種90:10比例具體的測試規模。我現在轉移到分層隨機分組(在scikit-learn中合併了Kfold和Shuffle Split)。我想知道,如果這樣做是否分層劃分與指定的測試尺寸比較好,或者我應該只是做沒有speicfying測試的大小?交叉驗證與
這是我在做什麼:
train=[]
with open("/Users/minks/Documents/documents.txt") as f:
for line in f:
train.append(line.strip().split())
train=np.array(train)
labels=[]
with open("/Users/minks/Documents/Labels.txt") as t:
for line in t:
labels.extend(line.strip().split())
labels=np.array(labels)
kf=StratifiedShuffleSplit(labels, n_iter=5, test_size=0.10)
for train_index, test_index in kf:
X_train, X_test = train[train_index],train[test_index]
Y_train, Y_test = labels[train_index],labels[test_index]
我想知道,如果指定test_size是性能良好的決策或不因爲如果我不這樣做它拿起隨機比率。