錯誤scikit.learn cross_val_score

請參考筆記本以下地址錯誤scikit.learn cross_val_score

的代碼這一部分，

scores = cross_val_score(LogisticRegression(), X, y, scoring='accuracy', cv=10) 
print scores 
print scores.mean()

在窗口7的64位機器

生成以下錯誤

--------------------------------------------------------------------------- 
IndexError        Traceback (most recent call last) 
<ipython-input-37-4a10affe67c7> in <module>() 
1 # evaluate the model using 10-fold cross-validation 
----> 2 scores = cross_val_score(LogisticRegression(), X, y, scoring='accuracy', cv=10) 
    3 print scores 
    4 print scores.mean() 

C:\Python27\lib\site-packages\sklearn\cross_validation.pyc in cross_val_score(estimator, X, y, scoring, cv, n_jobs, verbose, fit_params, score_func, pre_dispatch) 
    1140       allow_nans=True, allow_nd=True) 
    1141 
    -> 1142  cv = _check_cv(cv, X, y, classifier=is_classifier(estimator)) 
    1143  scorer = check_scoring(estimator, score_func=score_func, scoring=scoring) 
    1144  # We clone the estimator to make sure that all the folds are 

    C:\Python27\lib\site-packages\sklearn\cross_validation.pyc in _check_cv(cv, X, y, classifier, warn_mask) 
    1366   if classifier: 
    1367    if type_of_target(y) in ['binary', 'multiclass']: 
    -> 1368     cv = StratifiedKFold(y, cv, indices=needs_indices) 
    1369    else: 
    1370     cv = KFold(_num_samples(y), cv, indices=needs_indices) 

    C:\Python27\lib\site-packages\sklearn\cross_validation.pyc in __init__(self, y, n_folds, indices, shuffle, random_state) 
    428   for test_fold_idx, per_label_splits in enumerate(zip(*per_label_cvs)): 
    429    for label, (_, test_split) in zip(unique_labels, per_label_splits): 
--> 430     label_test_folds = test_folds[y == label] 
431     # the test split can be too big because we used 
432     # KFold(max(c, self.n_folds), self.n_folds) instead of 

IndexError: too many indices for array

我正在使用scikit.learn 0.15.2，這是建議d here這可能是windows 7,64位機器的一個特定問題。

============== ==============更新

我發現下面的代碼實際工作

from sklearn.cross_validation import KFold 
cv = KFold(X.shape[0], 10, shuffle=True, random_state=33) 
scores = cross_val_score(LogisticRegression(), X, y, scoring='accuracy', cv=cv) 
print scores

==============更新2 =============

它似乎由於一些包更新，我不能再重現這樣的錯誤我的機器。如果您在Windows 7 64位計算機上遇到同樣的問題，請告訴我。

來源

2014-10-22 tesla1060

'y'的形狀是什麼？ – 2014-10-22 12:01:34

@larsmans（6366L，） – tesla1060 2014-10-22 14:36:18

什麼可行，什麼不行的唯一區別是'cv'？ 'X.shape [0] == 6366'也？ – eickenberg 2014-10-22 14:37:44

我遇到了同樣的錯誤，並且在找到此問題時正在尋找答案。

我用同樣sklearn.cross_validation.cross_val_score（除了不同算法）和在同一臺機器的窗口7中，64位。

我想你的解決方案上面，併爲「工作」，但它給了我以下警告：

C：\用戶\ E245713 \應用程序數據\本地\連續\ Anaconda3 \ LIB \站點包\ sklearn \ cross_validation.py：1531：DataConversionWarning：當預期有1d數組時，會傳遞列向量y。請將y的形狀更改爲（n_samples），例如使用ravel（）。 estimator.fit（X_train，y_train，** fit_params）

閱讀警告之後，我想，這個問題有事情做與 'Y'（我的標籤欄）的形狀。要從警告中嘗試的關鍵字是「ravel（）」。所以，我試過如下：

y_arr = pd.DataFrame.as_matrix(label) 
print(y_arr) 
print(y_arr.shape())

這給了我

[[1] 
    [0] 
    [1] 
    .., 
    [0] 
    [0] 
    [1]] 

    (87939, 1)

當我加入 '拉威爾（）'：

y_arr = pd.DataFrame.as_matrix(label).ravel() 
print(y_arr) 
print(y_arr.shape())

它給了我：

[1 0 1 ..., 0 0 1] 

(87939,)

'y_arr'的維度必須是（87939，）不是（87939,1）的形式）。之後，我的原始cross_val_score沒有添加Kfold代碼工作。

希望這會有所幫助。

來源

2016-07-20 21:28:49 wi3o

錯誤scikit.learn cross_val_score

回答

相關問題