我試圖讓sklearn爲線性迴歸選擇最好的k個變量(例如k = 1)。這有效,我可以得到R平方,但它並不告訴我哪個變量是最好的。我怎麼能找到這個?sklearn selectKbest:選擇了哪些變量?
,我有以下形式的代碼(真正的變量列表是更長的時間):
X=[]
for i in range(len(df)):
X.append([averageindegree[i],indeg3_sum[i],indeg5_sum[i],indeg10_sum[i])
training=[]
actual=[]
counter=0
for fold in range(500):
X_train, X_test, y_train, y_test = crossval.train_test_split(X, y, test_size=0.3)
clf = LinearRegression()
#clf = RidgeCV()
#clf = LogisticRegression()
#clf=ElasticNetCV()
b = fs.SelectKBest(fs.f_regression, k=1) #k is number of features.
b.fit(X_train, y_train)
#print b.get_params
X_train = X_train[:, b.get_support()]
X_test = X_test[:, b.get_support()]
clf.fit(X_train,y_train)
sc = clf.score(X_train, y_train)
training.append(sc)
#print "The training R-Squared for fold " + str(1) + " is " + str(round(sc*100,1))+"%"
sc = clf.score(X_test, y_test)
actual.append(sc)
#print "The actual R-Squared for fold " + str(1) + " is " + str(round(sc*100,1))+"%"
'b.get_support()',你已經在使用它,給你一個選定功能的布爾掩碼。 –
你是對的,明白了! –