在scikit-learn中迴歸交叉驗證的遞歸特徵消除

我想通過scikit-learn在我的迴歸問題上應用像遞歸特徵消除這樣的包裝方法。 Recursive feature elimination with cross-validation給出了一個很好的概述，如何自動調整功能的數量。在scikit-learn中迴歸交叉驗證的遞歸特徵消除

我嘗試這樣做：

modelX = LogisticRegression() 
rfecv = RFECV(estimator=modelX, step=1, scoring='mean_absolute_error') 
rfecv.fit(df_normdf, y_train) 
print("Optimal number of features : %d" % rfecv.n_features_) 

# Plot number of features VS. cross-validation scores 
plt.figure() 
plt.xlabel("Number of features selected") 
plt.ylabel("Cross validation score (nb of correct classifications)") 
plt.plot(range(1, len(rfecv.grid_scores_) + 1), rfecv.grid_scores_) 
plt.show()`

，但我收到這樣

`The least populated class in y has only 1 members, which is too few. 
The minimum number of labels for any class cannot be less than n_folds=3. % (min_labels, self.n_folds)), Warning)

的錯誤信息警告聽起來像我有一個分類問題，但我的任務就是一個迴歸問題。我能做些什麼來獲得結果和有什麼不對？

來源

2016-11-16 matthew

宛你告訴我們，你的'y_train'？ – MMF

我的y_train有1列和〜10,000行，值在1和200之間。 – matthew

值是整數嗎？如果是這樣，我認爲它認爲它是一個多類分類問題。嘗試將值轉換爲浮點數。 – MMF

這裏是發生了什麼：

默認情況下，當不被用戶所指示的褶皺數目，交叉驗證的RFE的使用3-fold交叉驗證。到現在爲止還挺好。

然而，如果你看一下documentation，該機還採用StartifiedKFold這確保了褶皺通過保存樣品的每個類的比例創建。因此，由於看起來（根據錯誤）您的輸出y的某些元素是唯一，因此它們不能同時出現在3個不同的摺疊中。它會引發錯誤！

錯誤來自here。

然後您需要使用非分層K折：KFold。

的RFECV文檔說： "If the estimator is a classifier or if y is neither binary nor multiclass, sklearn.model_selection.KFold is used."

來源

2016-11-16 13:26:22 MMF

在scikit-learn中迴歸交叉驗證的遞歸特徵消除

回答

相關問題