從多索引中選擇（具有重複值）

使用具有分層索引的數據，是否有方法可以輕鬆選擇一系列值？我所見過的所有方法（包括xs和.loc）似乎都侷限於單個值，請參閱Benefits of panda's multiindex?。使用此示例數據，從多索引中選擇（具有重複值）

from pandas import * 
from numpy import * 
import itertools as it 

M = 100 # Number of rows to generate 

# Create some test data with multiindex 
df = DataFrame(randn(M, 10)) 
df.index = [randint(4, size=M), randint(8, size=M)] 
df.index.rename(['a', 'b'])

我希望能夠選擇的一切，其中第一指數爲1或2，第二個指標是3或4，我是用.loc有來最接近元組

# Now extract a subset 
part = df.loc[[(1, 3), (1,4), (2,3), (2,4)]]

但這給出了一些奇怪的行爲的列表，

# The old indices are still shown for some reason 
print(part.index.levels) 

# Good indexing 
print("correct:\n", part.loc[(1, 1)]) 
# No keyerror, although the key wasn't included 
print("wrong:\n", part.loc[[(0, 3)]]) 
# Indexing of first index, and then a column, very odd 
print("odd:\n", part.loc[(1, 9)]) 
# But there is an error accessing the original this way 
print("Expected error:\n", df.loc[(1, 9)])

輸出：

In [436]: [[0, 1, 2, 3], [0, 1, 2, 3, 4, 5, 6, 7]] 
correct: 
      0   1   2   3   4   5   6 \ 
1 3 -0.183667 0.578867 -0.944514 0.026295 0.778354 0.603845 0.636486 
    3 -0.337596 0.018084 -0.654721 -1.121475 -0.561706 0.695095 -0.512936 
    3 -0.670779 -0.425093 1.262278 -1.806815 0.855900 -0.230683 -0.225658 
    3 -0.274808 -0.529901 1.265333 0.559646 -1.418687 0.492577 0.141648 

      7   8   9 
1 3 1.109179 -1.569236 -0.617408 
    3 -0.659310 1.249105 0.032657 
    3 0.315601 1.100192 -0.389736 
    3 -0.267462 -0.025189 0.069047 
odd: 
3 -0.617408 
3 0.032657 
3 -0.389736 
3 0.069047 
4 0.217577 
4 -0.232357 
Name: 9, dtype: float64 
wrong: 
     0 1 2 3 4 5 6 7 8 9 
0 3 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 
--------------------------------------------------------------------------- 
KeyError         Traceback (most recent call last) 
(truncated)

那麼有沒有比訪問分層索引的多個部分的元組列表更好的方法？如果沒有，是否有一種方法來清理使用元組索引後的結果，以便給出明智的錯誤，而不是NaN？

來源

2016-11-16 user2699

可以使用pd.IndexSlice有更多的人類可讀slicings

In [52]: idx = pd.IndexSlice 

In [53]: dfmi.loc[idx[:,:,['C1','C3']],idx[:,'foo']] 
Out[53]: 
lvl0   a b 
lvl1   foo foo 
A0 B0 C1 D0 8 10 
     D1 12 14 
     C3 D0 24 26 
     D1 28 30 
    B1 C1 D0 40 42 
     D1 44 46 
     C3 D0 56 58 
...   ... ... 
A3 B0 C1 D1 204 206 
     C3 D0 216 218 
     D1 220 222 
    B1 C1 D0 232 234 
     D1 236 238 
     C3 D0 248 250 
     D1 252 254 

[32 rows x 2 columns]

看到這裏http://pandas.pydata.org/pandas-docs/stable/advanced.html#using-slicers

來源

2016-11-16 18:19:48

如何將與例子這項工作的問題給出？ – user2699

看起來像'df.loc [IndexSlice [[0,1]，[3,4]]，：]'應該可以工作，但是這會給出一個錯誤'KeyError：'MultiIndex Slicing要求索引完全放大tuple len （2），lexsort depth（0）''。 – user2699

從多索引中選擇（具有重複值）

回答

相關問題