2015-10-09 152 views
1

我試圖在熊貓中使用(不是真正的)新切片運算符,但有些東西我並沒有得到。假設我生成以下分層數據框:熊貓:多重索引切片 - 混合切片和列表

#Generate container to hold component DFs 
df_list=[] 

#Generate names for third dimension positions 
third_names=['front','middle','back'] 

#For three positions in the third dimension... 
for lab in third_names: 
    #...generate the corresponding section of raw data... 
    d=DataFrame(np.random.uniform(size=20).reshape(4,5),columns='a b c d e'.split(' ')) 
    #...name the columns dimension... 
    d.columns.name='dim1' 
    #...generate second and third dims (to go in index)... 
    d['dim2']=['one','two','three','four'] 
    d['dim3']=lab 
    #...set index... 
    d.set_index(['dim3','dim2'],inplace=True) 
    #...and throw the DF in the container 
    df_list.append(d) 

#Concatenate component DFs together 
d3=pd.concat(df_list) 

d3_long=d3.stack().sortlevel(0) 

print d3_long 

產量:

dim3 dim2 dim1 
back four a  0.501184 
       b  0.627202 
       c  0.329643 
       d  0.484261 
       e  0.884803 
     one a  0.834231 
       b  0.918897 
       c  0.196537 
       d  0.242109 
       e  0.860124 
     three a  0.782651 
       b  0.998361 
       c  0.849685 
       d  0.210377 
       e  0.866776 
     two a  0.908422 
       b  0.737073 
       c  0.064402 
       d  0.240718 
       e  0.044409 
front four a  0.100877 
       b  0.963870 
       c  0.254075 
       d  0.126556 
       e  0.033631 
     one a  0.243552 
       b  0.999168 
       c  0.752251 
       d  0.684718 
       e  0.353013 
     three a  0.938928 
       b  0.112993 
       c  0.615178 
       d  0.430318 
       e  0.330437 
     two a  0.301921 
       b  0.645425 
       c  0.464172 
       d  0.824765 
       e  0.606823 
middle four a  0.814888 
       b  0.228860 
       c  0.333184 
       d  0.622176 
       e  0.151248 
     one a  0.547780 
       b  0.592404 
       c  0.684111 
       d  0.885605 
       e  0.601560 
     three a  0.340951 
       b  0.839149 
       c  0.800098 
       d  0.663753 
       e  0.215224 
     two a  0.138430 
       b  0.917627 
       c  0.342968 
       d  0.406744 
       e  0.822957 
dtype: float64 

我可以在第一兩個維度與行爲我希望得到...

print d3_long.loc[(slice('front','middle'),slice('two','four')),:] 

產量:

dim3 dim2 dim1 
front four a  0.100877 
       b  0.963870 
       c  0.254075 
       d  0.126556 
       e  0.033631 
     one a  0.243552 
       b  0.999168 
       c  0.752251 
       d  0.684718 
       e  0.353013 
     three a  0.938928 
       b  0.112993 
       c  0.615178 
       d  0.430318 
       e  0.330437 
     two a  0.301921 
       b  0.645425 
       c  0.464172 
       d  0.824765 
       e  0.606823 
middle four a  0.814888 
       b  0.228860 
       c  0.333184 
       d  0.622176 
       e  0.151248 
     one a  0.547780 
       b  0.592404 
       c  0.684111 
       d  0.885605 
       e  0.601560 
     three a  0.340951 
       b  0.839149 
       c  0.800098 
       d  0.663753 
       e  0.215224 
     two a  0.138430 
       b  0.917627 
       c  0.342968 
       d  0.406744 
       e  0.822957 
dtype: float64 

然而,以下調用產生完全相同的結果。

d3_long.loc[(slice('front','middle'),slice('two','four'),slice('b','d')),:] 

這就像它忽略了MultiIndex的第三級。當我嘗試使用列表結構來獲取特定位置時...

d3_long.loc[(slice('front','middle'),slice('two','four'),['b','d']),:] 

它產生TypeError。有什麼想法嗎?

回答

0

d3_long實際上是Series,所以你不需要在你的切片機的最後:。請注意,您的第二級slice('two','four')不會選擇任何內容(它相當於[-1:1])。

但是,如果你扭轉順序,它應該給你所期望的。

In [82]: d3_long.loc[slice('front','middle'),slice('four','two'), ['b','d']] 
Out[82]: 
dim3 dim2 dim1 
front four b  0.301573 
       d  0.478005 
     one b  0.306292 
       d  0.281984 
     three b  0.108174 
       d  0.776523 
     two b  0.028694 
       d  0.527417 
middle four b  0.285103 
       d  0.647165 
     one b  0.807411 
       d  0.309446 
     three b  0.277752 
       d  0.939555 
     two b  0.470019 
       d  0.447640 
dtype: float64 
+0

我被這個錯誤掛了,我甚至沒有注意到二級命令。這很有用,謝謝。 –