支持SVM的GridSearch生成IndexError

我正在使用SVM構建分類器，並希望執行網格搜索以幫助自動查找最佳模型。下面的代碼：支持SVM的GridSearch生成IndexError

from sklearn.svm import SVC 
from sklearn.model_selection import train_test_split 
from sklearn.model_selection import GridSearchCV 
from sklearn.multiclass import OneVsRestClassifier 

X.shape  # (22343, 323) 
y.shape  # (22343, 1) 

X_train, X_test, y_train, y_test = train_test_split(
    X, Y, test_size=0.4, random_state=0 
) 

tuned_parameters = [ 
    { 
    'estimator__kernel': ['rbf'], 
    'estimator__gamma': [1e-3, 1e-4], 
    'estimator__C': [1, 10, 100, 1000] 
    }, 
    { 
    'estimator__kernel': ['linear'], 
    'estimator__C': [1, 10, 100, 1000] 
    } 
] 

model_to_set = OneVsRestClassifier(SVC(), n_jobs=-1) 
clf = GridSearchCV(model_to_set, tuned_parameters) 
clf.fit(X_train, y_train)

，我得到以下錯誤信息（這是不是整個堆棧跟蹤剛剛過去的3個電話。）：

---------------------------------------------------- 
/anaconda/lib/python3.5/site-packages/sklearn/model_selection/_split.py in split(self, X, y, groups) 
    88   X, y, groups = indexable(X, y, groups) 
    89   indices = np.arange(_num_samples(X)) 
---> 90   for test_index in self._iter_test_masks(X, y, groups): 
    91    train_index = indices[np.logical_not(test_index)] 
    92    test_index = indices[test_index] 

/anaconda/lib/python3.5/site-packages/sklearn/model_selection/_split.py in _iter_test_masks(self, X, y, groups) 
    606 
    607  def _iter_test_masks(self, X, y=None, groups=None): 
--> 608   test_folds = self._make_test_folds(X, y) 
    609   for i in range(self.n_splits): 
    610    yield test_folds == i 

/anaconda/lib/python3.5/site-packages/sklearn/model_selection/_split.py in _make_test_folds(self, X, y, groups) 
    593   for test_fold_indices, per_cls_splits in enumerate(zip(*per_cls_cvs)): 
    594    for cls, (_, test_split) in zip(unique_y, per_cls_splits): 
--> 595     cls_test_folds = test_folds[y == cls] 
    596     # the test split can be too big because we used 
    597     # KFold(...).split(X[:max(c, n_splits)]) when data is not 100% 

IndexError: too many indices for array

此外，當我試圖重塑陣列所以y是（22343，）我發現即使將tuned_parameters設置爲默認值，GridSearch也不會結束。

而且這裏的版本所有的軟件包是否有幫助：

的Python：3.5.2

scikit學習：0.18

大熊貓：0.19.0

來源

2016-10-06 William Gottschalk

您是否試圖減少樣本數量並運行它？ – MMF

它似乎你的實現沒有錯誤。

但是，正如sklearn文檔中提到的那樣，「擬合時間複雜度超過二次樣本數，因此樣本數很難通過多個10000樣本縮放到數據集」。 See documentation here

對於您的情況，您有22343樣本，這可能會導致一些計算問題/內存問題。這就是爲什麼當你做你的默認CV時，需要很多時間。嘗試減少您的火車設置使用10000樣本或更少。

來源

2016-10-06 18:07:38 MMF

支持SVM的GridSearch生成IndexError

回答

相關問題