2017-07-25 48 views
3

我有一個大的數據框(5000 x 12039),我想獲得匹配numpy數組的列名。在熊貓中查找與數組匹配的列名

舉例來說,如果我有表

 m1lenhr m1lenmin m1citywt m1a12a cm1age cm1numb m1b1a m1b1b m1b12a m1b12b ... kind_attention_scale_10 kind_attention_scale_22 kind_attention_scale_21 kind_attention_scale_15 kind_attention_scale_18 kind_attention_scale_19 kind_attention_scale_25 kind_attention_scale_24 kind_attention_scale_27 kind_attention_scale_23 
challengeID                     
1 0.130765 40.0 202.485367 1.893256 27.0 1.0 2.0 0.0 2.254198 2.289966 ... 0 0 0 0 0 0 0 0 0 0 
2 0.000000 40.0 45.608219 1.000000 24.0 1.0 2.0 0.0 2.000000 3.000000 ... 0 0 0 0 0 0 0 0 0 0 
3 0.000000 35.0 39.060299 2.000000 23.0 1.0 2.0 0.0 2.254198 2.289966 ... 0 0 0 0 0 0 0 0 0 0 
4 0.000000 30.0 22.304855 1.893256 22.0 1.0 3.0 0.0 2.000000 3.000000 ... 0 0 0 0 0 0 0 0 0 0 
5 0.000000 25.0 35.518272 1.893256 19.0 1.0 1.0 6.0 1.000000 3.000000 ... 0 

我想這樣做:

x = [40.0, 40.0, 35.0, 30.0, 25.0] 
find_column(x) 

,並有find_column(x)回報m1lenmin

回答

4

方法#1

這裏有一個量化的方法利用NumPy broadcasting -

df.columns[(df.values == np.asarray(x)[:,None]).all(0)] 

採樣運行 -

In [367]: df 
Out[367]: 
    0 1 2 3 4 5 6 7 8 9 
0 7 1 2 6 2 1 7 2 0 6 
1 5 4 3 3 2 1 1 1 5 5 
2 7 7 2 2 5 4 6 6 5 7 
3 0 5 4 1 5 7 8 2 2 4 
4 7 1 0 4 5 4 3 2 8 6 

In [368]: x = df.iloc[:,2].values.tolist() 

In [369]: x 
Out[369]: [2, 3, 2, 4, 0] 

In [370]: df.columns[(df.values == np.asarray(x)[:,None]).all(0)] 
Out[370]: Int64Index([2], dtype='int64') 

方法2

另外,這裏的另一個使用概念views -

def view1D(a, b): # a, b are arrays 
    a = np.ascontiguousarray(a) 
    b = np.ascontiguousarray(b) 
    void_dt = np.dtype((np.void, a.dtype.itemsize * a.shape[1])) 
    return a.view(void_dt).ravel(), b.view(void_dt).ravel() 

df1D_arr, x1D = view1D(df.values.T,np.asarray(x)[None]) 
out = np.flatnonzero(df1D_arr==x1D) 

採樣運行 -

In [442]: df 
Out[442]: 
    0 1 2 3 4 5 6 7 8 9 
0 7 1 2 6 2 1 7 2 0 6 
1 5 4 3 3 2 1 1 1 5 5 
2 7 7 2 2 5 4 6 6 5 7 
3 0 5 4 1 5 7 8 2 2 4 
4 7 1 0 4 5 4 3 2 8 6 

In [443]: x = df.iloc[:,5].values.tolist() 

In [444]: df1D_arr, x1D = view1D(df.values.T,np.asarray(x)[None]) 

In [445]: np.flatnonzero(df1D_arr==x1D) 
Out[445]: array([5]) 
+1

是啊,這是一個更優雅! :) – MaxU

+0

偉大的解決方案!效果良好且高效 – amaatouq

5

試試這個:

In [91]: x = np.array(x) 

In [94]: df.apply(lambda col: col.eq(x).all()) 
Out[94]: 
m1lenhr  False 
m1lenmin  True 
m1citywt False 
m1a12a  False 
cm1age  False 
cm1numb  False 
m1b1a  False 
m1b1b  False 
m1b12a  False 
m1b12b  False 
dtype: bool 

In [95]: df.columns[df.apply(lambda col: col.eq(x).all()).values] 
Out[95]: Index(['m1lenmin'], dtype='object')