0
我嘗試在特徵選擇中定義變量名稱。我有這樣如何在特徵選擇中定義變量名稱
import pandas as pd
df = pd.DataFrame ({'a' : [1, 0,1, 0,1, 0,1, 0,1, 0 ],
'b' : ['foo', 'bar','foo', 'bar','foo', 'bar','foo', 'bar','foo', 'bar' ] ,
'c' : ['foo', 'bar','bar','foo','foo', 'bar','bar','foo','foo', 'bar' ],
'd' :['d','d','b','a','d','d','a','b','d','a'] })
一個DataSet,以便
X, y = df.ix[:, 1:], df.ix[:,[0]]
X_dummy = pd.get_dummies(X)
而且
from sklearn.feature_selection import SelectKBest
from sklearn.feature_selection import chi2
X_new = SelectKBest(chi2, k=4).fit_transform(X_dummy, y)
X_new
array([[0, 1, 0, 1],
[1, 0, 0, 1],
[0, 1, 0, 0],
[1, 0, 1, 0],
[0, 1, 0, 1],
[1, 0, 0, 1],
[0, 1, 1, 0],
[1, 0, 0, 0],
[0, 1, 0, 1],
[1, 0, 1, 0]], dtype=uint8)
我得到的數組,但我想知道什麼是變量(b
,c
或d
或他們的虛擬期權)必須在模型中包含。如何找出這個?謝謝!