0
仍嘗試瞭解多索引選擇。建立數據框:爲什麼訂單使用.ix的多索引選擇很重要
import pandas as pd
from numpy import *
arrays = [['bar', 'bar', 'baz', 'baz', 'foo', 'foo', 'qux', 'qux'],
['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two'],
['cat', 'cat', 'cat', 'cat', 'dog', 'dog', 'dog', 'dog']]
tuples = zip(*arrays)
index = pd.MultiIndex.from_tuples(tuples, names=['first','second','third'])
data = pd.DataFrame(random.randn(8,3), index=index, columns=['c1','c2','c3'])
>>> data
c1 c2 c3
first second third
bar one cat -0.309651 -0.242866 0.824422
two cat -0.349640 0.873796 -1.879832
baz one cat -0.851390 -1.241419 -0.016495
two cat 0.737211 -0.617967 -2.215459
foo one dog -0.231820 0.140641 -1.619270
two dog -1.363132 -0.929765 -0.005083
qux one dog -1.187903 -0.753883 -0.442464
two dog 0.652967 0.423994 -0.705735
問題1:如果我要改變 「C1」 值C1 * 10,我能做到這一點的是什麼?我試過
data.ix['cat'].c1 = data.ix['cat'].c1*10
# Also tried
data.xs('cat',level='second').c1 = data.xs('cat',level='second').c1*10
這兩者都沒有工作。我得到一個 「KeyError異常」 爲第一和
「類型錯誤: 'instancemethod' 對象有沒有屬性 '的GetItem'」 第二
混亂的解決方案
我找到了解決方案重新排序索引,但這有奇怪的行爲(至少對我來說很陌生)。
d = data.copy()
d.index = d.index.reorder_levels([2,0,1])
>>> d
c1 c2 c3
third first second
cat bar one -0.309651 -0.242866 0.824422
two -0.349640 0.873796 -1.879832
baz one -0.851390 -1.241419 -0.016495
two 0.737211 -0.617967 -2.215459
dog foo one -0.231820 0.140641 -1.619270
two -1.363132 -0.929765 -0.005083
qux one -1.187903 -0.753883 -0.442464
two 0.652967 0.423994 -0.705735
# Now perform the operation (use *NaN below to make changes easily distinguished)
d.ix['cat'].c1 = d.ix['cat'].c1*NaN
>>> d
c1 c2 c3
third first second
cat bar one NaN -0.242866 0.824422
two NaN 0.873796 -1.879832
baz one NaN -1.241419 -0.016495
two NaN -0.617967 -2.215459
dog foo one -0.231820 0.140641 -1.619270
two -1.363132 -0.929765 -0.005083
qux one -1.187903 -0.753883 -0.442464
two 0.652967 0.423994 -0.705735
太好了!這工作。但是如果我看到第一個索引是「第二」呢?
d = data.copy()
d.index = d.index.reorder_levels([1,0,2])
>>> d
c1 c2 c3
second first third
one bar cat -0.309651 -0.242866 0.824422
two bar cat -0.349640 0.873796 -1.879832
one baz cat -0.851390 -1.241419 -0.016495
two baz cat 0.737211 -0.617967 -2.215459
one foo dog -0.231820 0.140641 -1.619270
two foo dog -1.363132 -0.929765 -0.005083
one qux dog -1.187903 -0.753883 -0.442464
two qux dog 0.652967 0.423994 -0.705735
# Using the same logic as above...
d.ix['two'].c1 = d.ix['two'].c1*NaN
>>> c1 c2 c3
second first third
one bar cat -0.309651 -0.242866 0.824422
two bar cat -0.349640 0.873796 -1.879832
one baz cat -0.851390 -1.241419 -0.016495
two baz cat 0.737211 -0.617967 -2.215459
one foo dog -0.231820 0.140641 -1.619270
two foo dog -1.363132 -0.929765 -0.005083
one qux dog -1.187903 -0.753883 -0.442464
two qux dog 0.652967 0.423994 -0.705735
沒有變化!但是,這(下同)不工作
# Keeping same data frame from previous example
d.c1.ix['two'] = d.ix['two'].c1*NaN
>>> d
c1 c2 c3
second first third
one bar cat -0.309651 -0.242866 0.824422
two bar cat NaN 0.873796 -1.879832
one baz cat -0.851390 -1.241419 -0.016495
two baz cat NaN -0.617967 -2.215459
one foo dog -0.231820 0.140641 -1.619270
two foo dog NaN -0.929765 -0.005083
one qux dog -1.187903 -0.753883 -0.442464
two qux dog NaN 0.423994 -0.705735
問題2:我不明白爲什麼d.ix的順序[ 'ID'] C1 VS d.c1.ix [ 'ID'。 ]取決於數據框索引級別的排序方式。這對其他人有意義嗎?如果是這樣,你能解釋一下這裏發生了什麼嗎?任何幫助深表感謝。
閱讀:http://pandas.pydata.org /pandas-docs/dev/indexing.html#indexing-view-versus-copy。使用''df.loc [row,column] = value''來確保你正在設置實際的對象 – Jeff
,所以你只需要改變一個子幀。一定的列索引一定的水平?或所有的列「c1」? – Jeff
只是想改變一個子框架。 – tnknepp