我對Python & pandas比較陌生，並且在（分層）索引中掙扎。我已經涵蓋了基礎知識，但是由於更先進的切片和橫切片而丟失了。在熊貓數據框中排除索引行的最有效方法

例如，下面的數據幀

import pandas as pd 
import numpy as np 
data = pd.DataFrame(np.arange(9).reshape((3, 3)), 
    index=pd.Index(['Ohio', 'Colorado', 'New York'], name='state'), columns=pd.Index(['one', 'two', 'three'], name='number'))

我要選擇除與指數「科羅拉多」行了一切。對於一個小數據集，我可以這樣做：

data.ix[['Ohio','New York']]

但是，如果唯一索引值的數目很大，那是不切實際的。天真地，我期待一個語法，如

data.ix[['state' != 'Colorado']]

但是，這隻返回第一個記錄'俄亥俄'，不返回'紐約'。這個工程，但很麻煩

filter = list(set(data.index.get_level_values(0).unique()) - set(['Colorado'])) 
data[filter]

肯定會有一個更Pythonic，這樣做的詳細方式？

來源

2014-02-08 dkapitan

這是一個Python問題，而不是pandas一：'state' != 'Colorado'是真的，所以pandas得到的是data.ix[[True]]。

你可以做

>>> data.loc[data.index != "Colorado"] 
number one two three 
state      
Ohio  0 1  2 
New York 6 7  8 

[2 rows x 3 columns]

或使用DataFrame.query：

>>> data.query("state != 'New York'") 
number one two three 
state      
Ohio  0 1  2 
Colorado 3 4  5 

[2 rows x 3 columns]

，如果你不喜歡的data重複。（引用傳遞給.query()方法表達是迴避的事實，否則的Python會前pandas見過它評估比較的唯一途徑之一。）

來源

2014-02-08 19:40:25 DSM

感謝：即澄清了很多！ – dkapitan

這是一個強大的解決方案，也將與多指標工作對象

單指標

excluded = ['Ohio'] 
indices = data.index.get_level_values('state').difference(excluded) 
indx = pd.IndexSlice[indices.values]

輸出

In [77]: data.loc[indx] 
Out[77]: 
number one two three 
state 
Colorado 3 4  5 
New York 6 7  8

多指標Extensi在

這裏我擴展到一個MultiIndex的例子...

data = pd.DataFrame(np.arange(18).reshape(6,3), index=pd.MultiIndex(levels=[[u'AU', u'UK'], [u'Derby', u'Kensington', u'Newcastle', u'Sydney']], labels=[[0, 0, 0, 1, 1, 1], [0, 2, 3, 0, 1, 2]], names=[u'country', u'town']), columns=pd.Index(['one', 'two', 'three'], name='number'))

假設我們要排除這兩個例子'Newcastle'在這個新的多指標

excluded = ['Newcastle'] 
indices = data.index.get_level_values('town').difference(excluded) 
indx = pd.IndexSlice[:, indices.values]

這給預期的結果

In [115]: data.loc[indx, :] 
Out[115]: 
number    one two three 
country town 
AU  Derby   0 1  2 
     Sydney  3 4  5 
UK  Derby   0 1  2 
     Kensington 3 4  5

常見缺陷

確保所有級別的索引排序，您需要data.sort_index(inplace=True)
確保您包括列data.loc[indx, :]
空片有時indx = pd.IndexSlice[:, indices]是不夠好，但我發現，我經常需要使用indx = pd.IndexSlice[:, indices.values]

來源

2017-08-03 15:03:07

在熊貓數據框中排除索引行的最有效方法

回答

單指標

多指標Extensi在

常見缺陷

相關問題