2014-02-17 61 views
4

我有一個數據幀從我想用另外的任意列正常化一些任意列:的Python:正火有的一大熊貓數據框的列

import itertools as it 
import numpy as np 
import pandas as pd 

header = tuple(['h_seqNum', 'h_stamp', 'user_id']) 
joints = tuple(['head', 'neck', 'torso']) 
attribs = tuple(['pos_x','pos_y','pos_z']) 

all_columns = it.izip(*it.product(joints, attribs)) 
multiind_first = list(it.chain(['header']*len(header), all_columns.next(), ['pose',])) 
multiind_second = list(it.chain(header, all_columns.next(), ['pose',])) 

df = pd.DataFrame(np.random.rand(65).reshape(5,13), columns = pd.MultiIndex.from_arrays([multiind_first, multiind_second], names=['joint', 'attrib'])) 

生成的數據幀是這樣一個:

joint header       head      neck      torso      pose 
attrib h_seqNum h_stamp user_id pos_x pos_y pos_z pos_x pos_y pos_z pos_x pos_y pos_z pose 
0  0.681  0.059  0.607  0.093 0.504 0.975 0.317 0.739 0.129 0.759 0.254 0.814 1 
1  0.914  0.420  0.305  0.242 0.700 0.180 0.324 0.171 0.477 0.943 0.877 0.069 0 
2  0.522  0.395  0.118  0.739 0.653 0.326 0.947 0.517 0.036 0.647 0.079 0.227 0 
3  0.475  0.815  0.792  0.208 0.472 0.427 0.213 0.544 0.440 0.033 0.636 0.527 2 
4  0.767  0.774  0.983  0.646 0.949 0.947 0.402 0.015 0.913 0.734 0.192 0.032 0  

我想使用另一個任意關節(例如「軀幹」)來歸一化屬於任意關節(例如'頭部')的所有列(attrib)。比如類似的東西。

df['head'] = df['head'] - df['torso'] 
df['neck'] = df['neck'] - df['torso'] 
# Note that torso remains "unnormalized" 

爲此我寫了一個函數:

def normalize_joints(df, from_joint): 
    joint_names = set(joints) - set([from_joint,]) 
    for j in list(joint_names): 
     df[j] = df[j] - df[norm_name] 

然而,當我執行這個功能我得到以下錯誤:

normalize_joints(df, 'torso') 

--------------------------------------------------------------------------- 
AttributeError       Traceback (most recent call last) 
<ipython-input-414-47f39f04716d> in <module>() 
----> 1 normalize_joints(df, 'torso') 

<ipython-input-407-cf13a67fabd8> in normalize_joints(df, from_joint) 
     2  joint_names = set(joints) - set([from_joint,]) 
     3  for j in list(joint_names): 
----> 4   df[j] = df[j] - df[from_joint] 

/Library/Python/2.7/site-packages/pandas/core/frame.pyc in __setitem__(self, key, value) 
    2117           fill_value, limit, takeable=takeable) 
    2118 
-> 2119   return frame 
    2120 
    2121  def _reindex_index(self, new_index, method, copy, level, fill_value=NA, 

/Library/Python/2.7/site-packages/pandas/core/frame.pyc in _set_item(self, key, value) 
    2164  @Appender(_shared_docs['reindex_axis'] % _shared_doc_kwargs) 
    2165  def reindex_axis(self, labels, axis=0, method=None, level=None, copy=True, 
-> 2166      limit=None, fill_value=np.nan): 
    2167   return super(DataFrame, self).reindex_axis(labels=labels, axis=axis, 
    2168             method=method, level=level, 

/Library/Python/2.7/site-packages/pandas/core/generic.pyc in _set_item(self, key, value) 
    677 
    678  __bool__ = __nonzero__ 
--> 679 
    680  def bool(self): 
    681   """ Return the bool of a single element PandasObject 

/Library/Python/2.7/site-packages/pandas/core/internals.pyc in set(self, item, value) 
    1768  def sp_index(self): 
    1769   return self.values.sp_index 
-> 1770 
    1771  @property 
    1772  def kind(self): 

/Library/Python/2.7/site-packages/pandas/core/internals.pyc in _reset_ref_locs(self) 
    1054   # see if we can align other 
    1055   if hasattr(other, 'reindex_axis'): 
-> 1056    if align: 
    1057     axis = getattr(other, '_info_axis_number', 0) 
    1058     other = other.reindex_axis(self.items, axis=axis, 

/Library/Python/2.7/site-packages/pandas/core/internals.pyc in _rebuild_ref_locs(self) 
    1062 
    1063   # make sure that we can broadcast 
-> 1064   is_transposed = False 
    1065   if hasattr(other, 'ndim') and hasattr(values, 'ndim'): 
    1066    if values.ndim != other.ndim or values.shape == other.shape[::-1]: 

AttributeError: _ref_locs 

幾次嘗試後,我一直沒能找到我的錯誤的來源。如果我執行操作

df['head'] - df['torso'] 

它返回我一個DataFrame與正確的結果。但是,當我嘗試將此DataFrame分配給df ['head']時,出現以前顯示的錯誤。

是否有任何方式來執行此任務?

此外,我想知道是否有更好的方法來執行相同的規範化比我試圖。也許使用groupby,然後將normalize函數應用於選定的DataFrame?

編輯:用numpy的1.6發生

此錯誤和熊貓0.12

升級到numpy的1.8之後和熊貓0.13以下操作是有效的:

df['head'] = df['head'] - df['torso'] 
+0

在你的第一個代碼塊,需要用'mi_level_one'和'multiind_second'更換'multiind_first'與'mi_level_two'。 – LondonRob

+0

替換。只是一個複製粘貼我的代碼的問題。 謝謝! – VGonPa

回答

2

我相信我已經找到了一個相當簡單的解決方案:

def normalize(df, from_joint): 
    df.drop(['header', 'pose', from_joint], axis=1, level='joint').sub(df[from_joint], level=1) 

df.update(normalize(df, 'torso')) 
2

的問題是,您的列是MultiIndex的實例請試試這個:

def normalize_joints(df, from_joint): 
    joint_names = set(joints) - set([from_joint,]) 
    for j in list(joint_names): 
     keys = [(j,c) for c in attribs] 
     df[keys] = df[j] - df[from_joint] 

print df 
normalize_joints(df, 'torso') 
print df 

輸出:

joint  header       head       neck       torso       pose 
attrib h_seqNum h_stamp user_id  pos_x  pos_y  pos_z  pos_x  pos_y  pos_z  pos_x  pos_y  pos_z  pose 
0  0.067366 0.957394 0.983969 0.602662 0.505270 0.990675 0.753841 0.598397 0.846479 0.757155 0.220009 0.328470 0.686525 
1  0.806405 0.800388 0.302178 0.935559 0.180360 0.322767 0.230457 0.617555 0.602589 0.109482 0.181803 0.311266 0.929481 
2  0.649677 0.237286 0.963088 0.370463 0.471590 0.489256 0.060383 0.070885 0.858312 0.306232 0.511731 0.257015 0.283287 
3  0.054800 0.127925 0.099985 0.700160 0.211256 0.026782 0.820380 0.922593 0.600130 0.100745 0.418157 0.869735 0.597275 
4  0.678372 0.334520 0.247894 0.616133 0.914610 0.229628 0.317488 0.224910 0.620222 0.952499 0.946568 0.539502 0.838473 
joint  header       head       neck       torso       pose 
attrib h_seqNum h_stamp user_id  pos_x  pos_y  pos_z  pos_x  pos_y  pos_z  pos_x  pos_y  pos_z  pose 
0  0.067366 0.957394 0.983969 -0.154493 0.285261 0.662205 -0.003314 0.378387 0.518009 0.757155 0.220009 0.328470 0.686525 
1  0.806405 0.800388 0.302178 0.826077 -0.001443 0.011501 0.120975 0.435752 0.291322 0.109482 0.181803 0.311266 0.929481 
2  0.649677 0.237286 0.963088 0.064231 -0.040141 0.232241 -0.245850 -0.440846 0.601297 0.306232 0.511731 0.257015 0.283287 
3  0.054800 0.127925 0.099985 0.599414 -0.206900 -0.842953 0.719635 0.504436 -0.269605 0.100745 0.418157 0.869735 0.597275 
4  0.678372 0.334520 0.247894 -0.336366 -0.031958 -0.309874 -0.635011 -0.721658 0.080719 0.952499 0.946568 0.539502 0.838473 
+0

謝謝,@xndrme 您的回答引起了我另一個問題。 爲什麼如果df ['head'] - df ['torso']產生一個pd.DataFrame,其結果與您的答案相同,則無法將其分配給df ['head']? 我知道它必須是與'MultiIndex'相關的東西,但我不明白爲什麼 – VGonPa

+1

問題在於多索引上的'df ['head']'只是部分的,它適用於_getting_數據,但是似乎你應該提供所有的多層次索引(我認爲它與熊貓的實現有關,也許它的一些開發人員可以更好地回答你的問題;) –

+0

不知何故,開發人員似乎有這個問題。升級到numpy 1.8和大熊貓0.13解決了這個問題。 – VGonPa