2014-01-31 71 views
11

我試圖讓sklearn爲線性迴歸選擇最好的k個變量(例如k = 1)。這有效,我可以得到R平方,但它並不告訴我哪個變量是最好的。我怎麼能找到這個?sklearn selectKbest:選擇了哪些變量?

,我有以下形式的代碼(真正的變量列表是更長的時間):

X=[] 
for i in range(len(df)): 
X.append([averageindegree[i],indeg3_sum[i],indeg5_sum[i],indeg10_sum[i]) 


training=[] 
actual=[] 
counter=0 
for fold in range(500): 
    X_train, X_test, y_train, y_test = crossval.train_test_split(X, y, test_size=0.3) 
    clf = LinearRegression() 
    #clf = RidgeCV() 
    #clf = LogisticRegression() 
    #clf=ElasticNetCV() 

    b = fs.SelectKBest(fs.f_regression, k=1) #k is number of features. 
    b.fit(X_train, y_train) 
    #print b.get_params 

    X_train = X_train[:, b.get_support()] 
    X_test = X_test[:, b.get_support()] 


    clf.fit(X_train,y_train) 
    sc = clf.score(X_train, y_train) 
    training.append(sc) 
    #print "The training R-Squared for fold " + str(1) + " is " + str(round(sc*100,1))+"%" 
    sc = clf.score(X_test, y_test) 
    actual.append(sc) 
    #print "The actual R-Squared for fold " + str(1) + " is " + str(round(sc*100,1))+"%" 
+11

'b.get_support()',你已經在使用它,給你一個選定功能的布爾掩碼。 –

+0

你是對的,明白了! –

回答

1

嘗試使用的b.fit_transform()代替b.tranform()。該fit_transform()功能相契合,並與選定的功能將您的輸入X新的X和返回新X.

... 
b = fs.SelectKBest(fs.f_regression, k=1) #k is number of features. 
X_train = b.fit_transform(X_train, y_train) 
#print b.get_params 
... 
1

您需要使用get_support:

features_columns = [.......] 
fs = SelectKBest(score_func=f_regression, k=5) 
print zip(fs.get_support(),features_columns) 
0

這樣做是配置的方式用你最喜歡的函數SelectKBest(在你的情況下進行迴歸),然後從中得到參數。 我的代碼假定你有一個包含十

kb = SelectKBest(score_func=f_regression, k=5) # configure SelectKBest 
    kb.fit(X, Y) # fit it to your data 
    # get_support gives a vector [False, False, True, False....] 
    print(features_list[kb.get_support()]) 

的所有的頭條新聞的人的名單features_list當然,你可以寫它比我更Python :-)

相關問題