得分RFECV（）的scikit學習

Scikit學習庫支持遞歸特徵消除（RFE）和交叉驗證的版本（RFECV）。 RFECV對我來說非常有用，它選擇小功能，但我想知道RFE的交叉驗證是如何完成的。得分RFECV（）的scikit學習

RFE的方式，以減少至少重要的保護功能。因此，我認爲RFECV將計算交叉驗證得分1.

刪除功能1，但如果使用交叉驗證，我想每個折將選擇其他功能的最重要的，因爲數據是不同的。有人知道如何在RFECV中刪除功能嗎？

2016-01-10 z991

的交叉驗證在多個特徵完成。每個CV迭代更新每個刪除特徵數量的得分。

然後根據分數挑選要保留的功能號碼n_features_to_select，並在完整數據集上使用RFE，只保留n_features_to_select功能。

for n, (train, test) in enumerate(cv): 
    X_train, y_train = _safe_split(self.estimator, X, y, train) 
    X_test, y_test = _safe_split(self.estimator, X, y, test, train) 

    rfe = RFE(estimator=self.estimator, 
       n_features_to_select=n_features_to_select, 
       step=self.step, estimator_params=self.estimator_params, 
       verbose=self.verbose - 1) 

    rfe._fit(X_train, y_train, lambda estimator, features: 
      _score(estimator, X_test[:, features], y_test, scorer)) 
    scores.append(np.array(rfe.scores_[::-1]).reshape(1, -1)) 
scores = np.sum(np.concatenate(scores, 0), 0) 
# The index in 'scores' when 'n_features' features are selected 
n_feature_index = np.ceil((n_features - n_features_to_select)/
          float(self.step)) 
n_features_to_select = max(n_features_to_select, 
          n_features - ((n_feature_index - 
             np.argmax(scores)) * 
             self.step)) 
# Re-execute an elimination with best_k over the whole set 
rfe = RFE(estimator=self.estimator, 
      n_features_to_select=n_features_to_select, 
      step=self.step, estimator_params=self.estimator_params) 
rfe.fit(X, y)

來源

2016-01-10 09:49:13

非常感謝你。但我不明白'交叉驗證多個功能'的含義。我想知道的是'如何讓消除秩序。就我所瞭解的代碼而言，首先RFE運行整個數據集排序。（因此，只有1測試用於製造順序進行）。然後得分是針對每個交叉驗證測試，然後使用該順序刪除n個特徵，依此類推。我對嗎？ – z991

@ z991不完全。 RFE在每個CV倍數上運行，並且我們保留所有CV倍數中每個特徵評分的平均值。然後，我們使用平均分計算要移除的要素數量，使用整個數據集移除該要素數量。 –

現在我明白了這個流程。非常感謝你！ – z991

得分RFECV（）的scikit學習

回答

相關問題