GridSearchCV與StratifiedKFold

我想在RandomForestClassifier執行GridSearchCV，但數據是不均衡的，所以我用StratifiedKFold：GridSearchCV與StratifiedKFold

from sklearn.model_selection import StratifiedKFold 
from sklearn.grid_search import GridSearchCV 
from sklearn.ensemble import RandomForestClassifier 

param_grid = {'n_estimators':[10, 30, 100, 300], "max_depth": [3, None], 
      "max_features": [1, 5, 10], "min_samples_leaf": [1, 10, 25, 50], "criterion": ["gini", "entropy"]} 

rfc = RandomForestClassifier() 

clf = GridSearchCV(rfc, param_grid=param_grid, cv=StratifiedKFold()).fit(X_train, y_train)

但我得到一個錯誤：

TypeError         Traceback (most recent call last) 
<ipython-input-597-b08e92c33165> in <module>() 
    9 rfc = RandomForestClassifier() 
    10 
---> 11 clf = GridSearchCV(rfc, param_grid=param_grid, cv=StratifiedKFold()).fit(X_train, y_train) 

c:\python34\lib\site-packages\sklearn\grid_search.py in fit(self, X, y) 
    811 
    812   """ 
--> 813   return self._fit(X, y, ParameterGrid(self.param_grid)) 

c:\python34\lib\site-packages\sklearn\grid_search.py in _fit(self, X, y, parameter_iterable) 
    559          self.fit_params, return_parameters=True, 
    560          error_score=self.error_score) 
--> 561     for parameters in parameter_iterable 
    562     for train, test in cv) 

c:\python34\lib\site-packages\sklearn\externals\joblib\parallel.py in __call__(self, iterable) 
    756    # was dispatched. In particular this covers the edge 
    757    # case of Parallel used with an exhausted iterator. 
--> 758    while self.dispatch_one_batch(iterator): 
    759     self._iterating = True 
    760    else: 

c:\python34\lib\site-packages\sklearn\externals\joblib\parallel.py in dispatch_one_batch(self, iterator) 
    601 
    602   with self._lock: 
--> 603    tasks = BatchedCalls(itertools.islice(iterator, batch_size)) 
    604    if len(tasks) == 0: 
    605     # No more tasks available in the iterator: tell caller to stop. 

c:\python34\lib\site-packages\sklearn\externals\joblib\parallel.py in __init__(self, iterator_slice) 
    125 
    126  def __init__(self, iterator_slice): 
--> 127   self.items = list(iterator_slice) 
    128   self._size = len(self.items) 

c:\python34\lib\site-packages\sklearn\grid_search.py in <genexpr>(.0) 
    560          error_score=self.error_score) 
    561     for parameters in parameter_iterable 
--> 562     for train, test in cv) 
    563 
    564   # Out is a list of triplet: score, estimator, n_test_samples 

TypeError: 'StratifiedKFold' object is not iterable

當我寫cv=StratifiedKFold(y_train)我有ValueError: The number of folds must be of Integral type.但是當我寫`cv = 5時，它可以工作。

我不明白什麼是錯的StratifiedKFold

來源

2016-10-26 user183897

API中的最新版本的改變。您曾經傳遞y，現在只需在創建分層Klfold對象時傳遞數字即可。你以後通過y。

來源

2016-10-26 08:46:39 simon

我寫'CV = StratifiedKFold（10）'和得到'類型錯誤： 'StratifiedKFold' 對象不是iterable'何時應該套印Y？ – user183897

在當前版本中導入sklearn.model_selection.StratifiedKFold。然後你可以做cv = StratifiedKFold（10），應該沒有錯誤。但是，也許你是從前面的模塊導入，爲了兼容目的，它仍然存在，直到版本20爲止。 – simon

我可以再問一個問題嗎？我從這個網站下載http://www.lfd.uci.edu/~gohlke/pythonlibs/#scikit-learn文件scikit_learn-0.18-cp34-cp34m-win32.whl，安裝它，但現在我得到了'ImportError：DLL加載失敗：％1不是有效的Win32應用程序。 '。哪裏不對？ – user183897

似乎cv=StratifiedKFold()).fit(X_train, y_train)應改爲cv=StratifiedKFold()).split(X_train, y_train).

來源

2017-01-14 19:19:36 ebrahimi

這與錯誤無關。這條線：clf = GridSearchCV（rfc，param_grid = param_grid，cv = StratifiedKFold（））。fit（X_train，y_train）只是定義了對象clf，然後它調用fit方法來訓練/適應clf。 – sera

@ rll還提到，適合應該被拆分取代。 – ebrahimi

這裏的問題是一個API的變化在其他的答案中提到，但答案可能會更加明確。

的cv參數文檔狀態：

cv : int, cross-validation generator or an iterable, optional

Determines the cross-validation splitting strategy. Possible inputs for cv are:

None, to use the default 3-fold cross-validation, integer, to specify the number of folds.

An object to be used as a cross-validation generator.

An iterable yielding train/test splits.

For integer/None inputs, if y is binary or multiclass, StratifiedKFold used. If the estimator is a classifier or if y is neither binary nor multiclass, KFold is used.

所以，無論cross validation strategy使用，所有需要的是使用功能split提供發電機，作爲建議：

kfolds = StratifiedKFold(5) 
clf = GridSearchCV(estimator, parameters, scoring=qwk, cv=kfolds.split(xtrain,ytrain)) 
clf.fit(xtrain, ytrain)

來源

2017-06-01 14:34:07 rll

我完全一樣的問題。

爲我工作的解決方案是取代：

from sklearn.grid_search import GridSearchCV

與

from sklearn.model_selection import GridSearchCV

那麼它應該工作的罰款。

來源

2017-06-01 15:00:41 sera

在'0.18.1'版本的Sklearn。

GridSearchCV(estimator, param=param_grid, c=5)

實現具有5個分割一個StratifiedKFold。

文檔：

> cv : int, cross-validation generator or an iterable, optional 
>   Determines the cross-validation splitting strategy. 
>   Possible inputs for cv are: 
>   - None, to use the default 3-fold cross validation, 
>   - integer, to specify the number of folds in a `(Stratified)KFold`, 
>   - An object to be used as a cross-validation generator. 
>   - An iterable yielding train, test splits.

來源

2017-10-19 20:37:59

GridSearchCV與StratifiedKFold

回答

相關問題