2013-12-18 44 views
0

仍嘗試瞭解多索引選擇。建立數據框:爲什麼訂單使用.ix的多索引選擇很重要

import pandas as pd 
from numpy import * 

arrays = [['bar', 'bar', 'baz', 'baz', 'foo', 'foo', 'qux', 'qux'], 
      ['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two'], 
      ['cat', 'cat', 'cat', 'cat', 'dog', 'dog', 'dog', 'dog']] 

tuples = zip(*arrays) 
index = pd.MultiIndex.from_tuples(tuples, names=['first','second','third']) 
data = pd.DataFrame(random.randn(8,3), index=index, columns=['c1','c2','c3']) 

>>> data 
          c1  c2  c3 
first second third        
bar one cat -0.309651 -0.242866 0.824422 
     two cat -0.349640 0.873796 -1.879832 
baz one cat -0.851390 -1.241419 -0.016495 
     two cat 0.737211 -0.617967 -2.215459 
foo one dog -0.231820 0.140641 -1.619270 
     two dog -1.363132 -0.929765 -0.005083 
qux one dog -1.187903 -0.753883 -0.442464 
     two dog 0.652967 0.423994 -0.705735 

問題1:如果我要改變 「C1」 值C1 * 10,我能做到這一點的是什麼?我試過

data.ix['cat'].c1 = data.ix['cat'].c1*10 
# Also tried 
data.xs('cat',level='second').c1 = data.xs('cat',level='second').c1*10 

這兩者都沒有工作。我得到一個 「KeyError異常」 爲第一和

「類型錯誤: 'instancemethod' 對象有沒有屬性 '的GetItem'」 第二

混亂的解決方案

我找到了解決方案重新排序索引,但這有奇怪的行爲(至少對我來說很陌生)。

d = data.copy() 
d.index = d.index.reorder_levels([2,0,1]) 
>>> d 
          c1  c2  c3 
third first second        
cat bar one -0.309651 -0.242866 0.824422 
      two -0.349640 0.873796 -1.879832 
     baz one -0.851390 -1.241419 -0.016495 
      two  0.737211 -0.617967 -2.215459 
dog foo one -0.231820 0.140641 -1.619270 
      two -1.363132 -0.929765 -0.005083 
     qux one -1.187903 -0.753883 -0.442464 
      two  0.652967 0.423994 -0.705735 


# Now perform the operation (use *NaN below to make changes easily distinguished) 
d.ix['cat'].c1 = d.ix['cat'].c1*NaN 

>>> d 
          c1  c2  c3 
third first second        
cat bar one   NaN -0.242866 0.824422 
      two   NaN 0.873796 -1.879832 
     baz one   NaN -1.241419 -0.016495 
      two   NaN -0.617967 -2.215459 
dog foo one -0.231820 0.140641 -1.619270 
      two -1.363132 -0.929765 -0.005083 
     qux one -1.187903 -0.753883 -0.442464 
      two  0.652967 0.423994 -0.705735 

太好了!這工作。但是如果我看到第一個索引是「第二」呢?

d = data.copy() 
d.index = d.index.reorder_levels([1,0,2]) 
>>> d 
          c1  c2  c3 
second first third        
one bar cat -0.309651 -0.242866 0.824422 
two bar cat -0.349640 0.873796 -1.879832 
one baz cat -0.851390 -1.241419 -0.016495 
two baz cat 0.737211 -0.617967 -2.215459 
one foo dog -0.231820 0.140641 -1.619270 
two foo dog -1.363132 -0.929765 -0.005083 
one qux dog -1.187903 -0.753883 -0.442464 
two qux dog 0.652967 0.423994 -0.705735 

# Using the same logic as above... 
d.ix['two'].c1 = d.ix['two'].c1*NaN 

>>>       c1  c2  c3 
second first third        
one bar cat -0.309651 -0.242866 0.824422 
two bar cat -0.349640 0.873796 -1.879832 
one baz cat -0.851390 -1.241419 -0.016495 
two baz cat 0.737211 -0.617967 -2.215459 
one foo dog -0.231820 0.140641 -1.619270 
two foo dog -1.363132 -0.929765 -0.005083 
one qux dog -1.187903 -0.753883 -0.442464 
two qux dog 0.652967 0.423994 -0.705735 

沒有變化!但是,這(下同)不工作

# Keeping same data frame from previous example 
d.c1.ix['two'] = d.ix['two'].c1*NaN 

>>> d 
          c1  c2  c3 
second first third        
one bar cat -0.309651 -0.242866 0.824422 
two bar cat   NaN 0.873796 -1.879832 
one baz cat -0.851390 -1.241419 -0.016495 
two baz cat   NaN -0.617967 -2.215459 
one foo dog -0.231820 0.140641 -1.619270 
two foo dog   NaN -0.929765 -0.005083 
one qux dog -1.187903 -0.753883 -0.442464 
two qux dog   NaN 0.423994 -0.705735 

問題2:我不明白爲什麼d.ix的順序[ 'ID'] C1 VS d.c1.ix [ 'ID'。 ]取決於數據框索引級別的排序方式。這對其他人有意義嗎?如果是這樣,你能解釋一下這裏發生了什麼嗎?任何幫助深表感謝。

+0

閱讀:http://pandas.pydata.org /pandas-docs/dev/indexing.html#indexing-view-versus-copy。使用''df.loc [row,column] = value''來確保你正在設置實際的對象 – Jeff

+0

,所以你只需要改變一個子幀。一定的列索引一定的水平?或所有的列「c1」? – Jeff

+0

只是想改變一個子框架。 – tnknepp

回答

2

你的數據

In [48]: data = pd.DataFrame(random.randn(8,3), index=index, columns=['c1','c2','c3']) 

In [49]: data 
Out[49]: 
          c1  c2  c3 
first second third        
bar one cat 0.219103 -1.142457 0.045307 
     two cat 0.890187 1.097527 0.074196 
baz one cat -0.043345 -0.595815 0.775877 
     two cat -0.694324 -0.757964 -1.253632 
foo one dog -2.182311 0.474872 1.444720 
     two dog 1.482957 -0.658113 0.743051 
qux one dog 1.544032 -0.225756 0.821863 
     two dog 0.121410 -0.143425 1.157422 

[8 rows x 3 columns] 

撰寫要改變這些值的掩碼(可能是更復雜的 甚至是手動點擊這裏);你需要一個布爾對於每個指數雖然(例如,具有相同長度的幀的長度)

In [50]: mask = data.index.get_level_values('third') == 'cat' 

In [51]: mask 
Out[51]: array([ True, True, True, True, False, False, False, False], dtype=bool) 

直接指標

In [52]: data.loc[mask,'c1'] *= 10 

In [53]: data 
Out[53]: 
          c1  c2  c3 
first second third        
bar one cat 2.191029 -1.142457 0.045307 
     two cat 8.901870 1.097527 0.074196 
baz one cat -0.433448 -0.595815 0.775877 
     two cat -6.943241 -0.757964 -1.253632 
foo one dog -2.182311 0.474872 1.444720 
     two dog 1.482957 -0.658113 0.743051 
qux one dog 1.544032 -0.225756 0.821863 
     two dog 0.121410 -0.143425 1.157422 

[8 rows x 3 columns]