我有一組文檔和一組標籤。現在，我正在使用train_test_split以90:10的比例分割我的數據集。但是，我希望使用Kfold交叉驗證。我如何做K摺疊交叉驗證分裂列車和測試集？

train=[] 

with open("/Users/rte/Documents/Documents.txt") as f: 
    for line in f: 
     train.append(line.strip().split()) 

labels=[] 
with open("/Users/rte/Documents/Labels.txt") as t: 
    for line in t: 
     labels.append(line.strip().split()) 

X_train, X_test, Y_train, Y_test= train_test_split(train, labels, test_size=0.1, random_state=42)

當我嘗試scikit的文檔中提供的方法學：我收到一個錯誤，指出：

kf=KFold(len(train), n_folds=3) 

for train_index, test_index in kf: 
    X_train, X_test = train[train_index],train[test_index] 
    y_train, y_test = labels[train_index],labels[test_index]

錯誤

X_train, X_test = train[train_index],train[test_index] 
TypeError: only integer arrays with one element can be converted to an index

我如何可以執行10個折交叉在我的文檔和標籤上驗證？

來源

2016-02-03 minks

什麼您是否嘗試過讓Kfold交叉驗證工作？你有沒有看到[文檔頁面]上的例子（http://scikit-learn.org/stable/modules/generated/sklearn.cross_validation.KFold.html#sklearn.cross_validation.KFold）？ –

是的，我已經嘗試了在我的文檔和標籤集上給出的例子，但我收到一個錯誤：* X_train，X_test = train [train_index]，train [test_index] TypeError：只有一個元素的整數數組可以轉換爲指數* – minks

有兩種方法可以解決此錯誤：

第一種方式：

投下你的數據到numpy的數組：

import numpy as np 
[...] 
train = np.array(train) 
labels = np.array(labels)

那麼它應該與當前的代碼打交道。

方式二：

使用列表解析索引列車&標籤列表與train_index &的test_index列表

for train_index, test_index in kf: 
    X_train, X_test = [train[i] for i in train_index],[train[j] for j in test_index] 
    y_train, y_test = [labels[i] for i in train_index],[labels[j] for j in test_index]

（對於這個解決方案也看到相關的問題index list with another list）

來源

2016-02-03 15:05:16

我如何做K摺疊交叉驗證分裂列車和測試集？

錯誤

回答

相關問題