2012-09-13 72 views
8

我經常想通過組合多個DataFrame的列來創建一個新的DataFrame。該申請()函數允許我這樣做,但它要求我創建不必要的索引:如何使用Pandas groupby apply()而不添加額外的索引

In [359]: df = pandas.DataFrame({'x': 3 * ['a'] + 2 * ['b'], 'y': np.random.normal(size=5), 'z': np.random.normal(size=5)}) 

In [360]: df 
Out[360]: 
    x   y   z 
0 a 0.201980 -0.470388 
1 a 0.190846 -2.089032 
2 a -1.131010 0.227859 
3 b -0.263865 -1.906575 
4 b -1.335956 -0.722087 

In [361]: df.groupby('x').apply(lambda x: pandas.DataFrame({'r': (x.y + x.z).sum()/x.z.sum(), 's': (x.y + x.z ** 2).sum()/x.z.sum()})) 
--------------------------------------------------------------------------- 
ValueError        Traceback (most recent call last) 
/home/emarkley/work/src/partner_analysis2/main.py in <module>() 
----> 1 df.groupby('x').apply(lambda x: pandas.DataFrame({'r': (x.y + x.z).sum()/x.z.sum(), 's': (x.y + x.z ** 2).sum()/x.z.sum()})) 

/usr/local/lib/python3.2/site-packages/pandas-0.8.2.dev-py3.2-linux-x86_64.egg/pandas/core/groupby.py in apply(self, func, *args, **kwargs) 
    267   applied : type depending on grouped object and function 
    268   """ 
--> 269   return self._python_apply_general(func, *args, **kwargs) 
    270 
    271  def aggregate(self, func, *args, **kwargs): 

/usr/local/lib/python3.2/site-packages/pandas-0.8.2.dev-py3.2-linux-x86_64.egg/pandas/core/groupby.py in _python_apply_general(self, func, *args, **kwargs) 
    417    group_axes = _get_axes(group) 
    418 
--> 419    res = func(group, *args, **kwargs) 
    420 
    421    if not _is_indexed_like(res, group_axes): 

/home/emarkley/work/src/partner_analysis2/main.py in <lambda>(x) 
----> 1 df.groupby('x').apply(lambda x: pandas.DataFrame({'r': (x.y + x.z).sum()/x.z.sum(), 's': (x.y + x.z ** 2).sum()/x.z.sum()})) 

/usr/local/lib/python3.2/site-packages/pandas-0.8.2.dev-py3.2-linux-x86_64.egg/pandas/core/frame.py in __init__(self, data, index, columns, dtype, copy) 
    371    mgr = self._init_mgr(data, index, columns, dtype=dtype, copy=copy) 
    372   elif isinstance(data, dict): 
--> 373    mgr = self._init_dict(data, index, columns, dtype=dtype) 
    374   elif isinstance(data, ma.MaskedArray): 
    375    mask = ma.getmaskarray(data) 

/usr/local/lib/python3.2/site-packages/pandas-0.8.2.dev-py3.2-linux-x86_64.egg/pandas/core/frame.py in _init_dict(self, data, index, columns, dtype) 
    454   # figure out the index, if necessary 
    455   if index is None: 
--> 456    index = extract_index(data) 
    457   else: 
    458    index = _ensure_index(index) 

/usr/local/lib/python3.2/site-packages/pandas-0.8.2.dev-py3.2-linux-x86_64.egg/pandas/core/frame.py in extract_index(data) 
    4719 
    4720   if not indexes and not raw_lengths: 
-> 4721    raise ValueError('If use all scalar values, must pass index') 
    4722 
    4723   if have_series or have_dicts: 

ValueError: If use all scalar values, must pass index 

In [362]: df.groupby('x').apply(lambda x: pandas.DataFrame({'r': (x.y + x.z).sum()/x.z.sum(), 's': (x.y + x.z ** 2).sum()/x.z.sum()}, index=[0])) 
Out[362]: 
      r   s 
x      
a 0 1.316605 -1.672293 
b 0 1.608606 -0.972593 

有沒有辦法使用申請()或其他一些功能來獲得沒有額外的指數相同的結果零點?

回答

9

你產生每組的合計R和S值,所以你應該在這裏使用Series

In [26]: df.groupby('x').apply(lambda x: 
      Series({'r': (x.y + x.z).sum()/x.z.sum(), 
        's': (x.y + x.z ** 2).sum()/x.z.sum()})) 
Out[26]: 
      r   s 
x      
a -0.338590 -0.916635 
b 66.655533 102.566146 
+0

完美!我不知道我是如何錯過的。 – user2303

+0

謝謝你。我也不知道我應該返回一個系列,也嘗試返回一個DataFrame。這些信息(應該返回一個系列)應該在官方文檔中。 – JustAC0der

相關問題