2016-08-20 232 views
2

我有一個DF包含英國選舉結果的結果,每個聚會一列。所以DF類似於:根據標準選擇一個熊貓數據框的列

In[107]: Results.columns 
Out[107]: 
Index(['Press Association ID Number', 'Constituency Name', 'Region', 'Country', 
     'Constituency ID', 'Constituency Type', 'Election Year', 'Electorate', 
     ' Total number of valid votes counted ', 'Unnamed: 9', 
     ... 
     'Wessex Reg', 'Whig', 'Wigan', 'Worth', 'WP', 'WRP', 'WVPTFP', 'Yorks', 
     'Young', 'Zeb'], 
     dtype='object', length=147) 

例如,

Results.head(2) 
Out[108]: 
    Press Association ID Number Constituency Name Region Country \ 
0       1   Aberavon Wales Wales 
1       2   Aberconwy Wales Wales 

    Constituency ID Constituency Type Election Year Electorate \ 
0  W07000049   County   2015  49,821 
1  W07000058   County   2015  45,525 

    Total number of valid votes counted Unnamed: 9 ... Wessex Reg Whig \ 
0        31,523   NaN ...   NaN NaN 
1        30,148   NaN ...   NaN NaN 

    Wigan Worth WP WRP WVPTFP Yorks Young Zeb 
0 NaN NaN NaN NaN  NaN NaN NaN NaN 
1 NaN NaN NaN NaN  NaN NaN NaN NaN 

[2 rows x 147 columns] 

包含不同團體票的列是Results.ix[:, 'Unnamed: 9':]

大多數政黨投票在任何選區極少數的選票,所以我想將它們排除在外。有沒有一種方法(僅通過自己迭代每行和每列)只返回符合特定條件的列,例如至少有一個值> 1000?我非常希望能夠指定類似

Results.ix[:, 'Unnamed: 9': > 1000] 

回答

1

你能做到這樣:

In [94]: df 
Out[94]: 
      a   b   c   d   e   f   g   h 
0 -1.450976 -1.361099 -0.411566 0.955718 99.882051 -1.166773 -0.468792 100.333169 
1 0.049437 -0.169827 0.692466 -1.441196 0.446337 -2.134966 -0.407058 -0.251068 
2 -0.084493 -2.145212 -0.634506 0.697951 101.279115 -0.442328 -0.470583 99.392245 
3 -1.604788 -1.136284 -0.680803 -0.196149 2.224444 -0.117834 -0.299730 -0.098353 
4 -0.751079 -0.732554 1.235118 -0.427149 99.899120 1.742388 -1.636730 99.822745 
5 0.955484 -0.261814 -0.272451 1.039296 0.778508 -2.591915 -0.116368 -0.122376 
6 0.395136 -1.155138 -0.065242 -0.519787 100.446026 1.584397 0.448349 99.831206 
7 -0.691550 0.052180 0.827145 1.531527 -0.240848 1.832925 -0.801922 -0.298888 
8 -0.673087 -0.791235 -1.475404 2.232781 101.521333 -0.424294 0.088186 99.553973 
9 1.648968 -1.129342 -1.373288 -2.683352 0.598885 0.306705 -1.742007 -0.161067 

In [95]: df[df.loc[:, 'e':].columns[(df.loc[:, 'e':] > 50).any()]] 
Out[95]: 
      e   h 
0 99.882051 100.333169 
1 0.446337 -0.251068 
2 101.279115 99.392245 
3 2.224444 -0.098353 
4 99.899120 99.822745 
5 0.778508 -0.122376 
6 100.446026 99.831206 
7 -0.240848 -0.298888 
8 101.521333 99.553973 
9 0.598885 -0.161067 

說明:

In [96]: (df.loc[:, 'e':] > 50).any() 
Out[96]: 
e  True 
f False 
g False 
h  True 
dtype: bool 

In [97]: df.loc[:, 'e':].columns 
Out[97]: Index(['e', 'f', 'g', 'h'], dtype='object') 

In [98]: df.loc[:, 'e':].columns[(df.loc[:, 'e':] > 50).any()] 
Out[98]: Index(['e', 'h'], dtype='object') 

設置:

In [99]: df = pd.DataFrame(np.random.randn(10, 8), columns=list('abcdefgh')) 

In [100]: df.loc[::2, list('eh')] += 100 

UPDATE:

從熊貓0.20.1 the .ix indexer is deprecated, in favor of the more strict .iloc and .loc indexers開始。