2014-01-24 49 views
6

下面的代碼讀取清理的泰坦尼克號的數據,打印出所有的功能和分數如何存儲和打印前20%的功能名稱和分數?

import csv 
import numpy as np 

data = np.genfromtxt('titanic.csv',dtype=float, delimiter=',', names=True) 

feature_names = np.array(data.dtype.names) 
feature_names = feature_names[[ 0,1,2,3,4]] 

data = np.genfromtxt('plants.csv',dtype=float, delimiter=',', skip_header=1) 

_X = data[:, [0,1,2,3,4]] 
#Return a flattened array required by scikit-learn fit for 2nd argument 
_y = np.ravel(data[:,[5]]) 

from sklearn import feature_selection 
fs = feature_selection.SelectPercentile(feature_selection.chi2, percentile=20) 
X_train_fs = fs.fit_transform(_X, _y) 

print feature_names, '\n', fs.scores_ 

結果:

['A' 'B' 'C' 'D' 'E'] 
[ 4.7324711 89.1428574 70.23474577 7.02447375 52.42447817] 

我想要做的就是捕捉功能的前20% ,並將名稱和分數存儲在一個數組中,然後我可以按分數排序。這將有助於我在較大的功能設置降低尺寸。爲什麼我能獲得全部5個功能,我該如何解決這個問題,以及如何存儲和打印前20%的功能名稱和分數?

回答

6

你快到了。得分確實存儲在fs.scores_;但是,最終選定的功能(根據您設置的百分位數)存儲在X_train_fs中。嘗試打印的X_train_fs形狀,它應該有一個列數小於5

下面的代碼可以幫助你在分揀部分:

import numpy as np 
from sklearn import feature_selection 

_X = np.random.random((100,5)) 
_y = np.random.random(100) 
fs = feature_selection.SelectPercentile(feature_selection.chi2, percentile=20) 
X_train_fs = fs.fit_transform(_X, _y) 
feature_names = ['a','b','c','d','e'] 

print 'All features:', feature_names 
print 'Scores of these features:', fs.scores_ 
print '***Features sorted by score:', [feature_names[i] for i in np.argsort(fs.scores_)[::-1]] 
print 'Peeking into first few samples (before and after):' 
print _X[:10] 
print X_train_fs[:10] 

輸出:

All features: ['a', 'b', 'c', 'd', 'e'] 
Scores of these features: [ 17.08834764 13.97983442 18.0124008 17.79594679 14.77178022] 
***Features sorted by score: ['c', 'd', 'a', 'e', 'b'] 
Peeking into first few samples (before and after): 
[[ 0.34808143 0.79142591 0.75333429 0.69246515 0.29079619] 
[ 0.81726059 0.93065583 0.01183974 0.66227077 0.82216764] 
[ 0.8791751 0.21764549 0.06147596 0.01156631 0.22077268] 
[ 0.91079625 0.58496956 0.68548851 0.55365907 0.78447282] 
[ 0.24489774 0.88725231 0.32411121 0.09189075 0.83266337] 
[ 0.1041106 0.98683633 0.22545763 0.98577525 0.41408367] 
[ 0.09014649 0.51216454 0.62158409 0.94874742 0.81915236] 
[ 0.32828772 0.05461745 0.43343171 0.59472169 0.83159784] 
[ 0.33792151 0.47963184 0.08690499 0.31566743 0.26170533] 
[ 0.10012106 0.36240434 0.86687847 0.64894175 0.51167487]] 
[[ 0.75333429] 
[ 0.01183974] 
[ 0.06147596] 
[ 0.68548851] 
[ 0.32411121] 
[ 0.22545763] 
[ 0.62158409] 
[ 0.43343171] 
[ 0.08690499] 
[ 0.86687847]] 
+0

究竟是什麼我非常感謝您的幫助! –