2017-08-11 112 views
0

我有以下的數據幀my_df熊貓:customed聚合功能DataFrameGroupBy

name date   A_score  B_score 
------------------------------------------ 
John 2017-01-01  5   6 
John 2017-01-10  10   8 
John 2017-02-04  3   5 
Andy 2017-01-25  8   9 
Andy 2017 02-05  7   1 
Andy 2017-02-12  9   9 

對於每一個名字,我們想找到A_score和B_score的絕對增量。絕對增量定義爲最早日期和第二個最早日期之間的絕對值差異。

所得數據幀應該是這樣的:

name A_score_result  B_score_result 
---------------------------------------------- 
John   5    2 
Andy   1    8 

爲了實現這一點,我想:

new_df = my_df.groupby('name').apply(lambda x:myFun(x)) 

new_df = my_df.groupby('name').agg(['myFun'])  

其中myFun是:

def myFun(x): 
    y = x[2]-x[1] 
    return y 

然而,這兩種方法都有類似下面的錯誤:

/usr/local/lib/python3.4/dist-packages/pandas/core/frame.py in __getitem__(self, key) 
    2057    return self._getitem_multilevel(key) 
    2058   else: 
-> 2059    return self._getitem_column(key) 
    2060 
    2061  def _getitem_column(self, key): 

/usr/local/lib/python3.4/dist-packages/pandas/core/frame.py in _getitem_column(self, key) 
    2064   # get column 
    2065   if self.columns.is_unique: 
-> 2066    return self._get_item_cache(key) 
    2067 
    2068   # duplicate columns & possible reduce dimensionality 

/usr/local/lib/python3.4/dist-packages/pandas/core/generic.py in _get_item_cache(self, item) 
    1384   res = cache.get(item) 
    1385   if res is None: 
-> 1386    values = self._data.get(item) 
    1387    res = self._box_item_values(item, values) 
    1388    cache[item] = res 

/usr/local/lib/python3.4/dist-packages/pandas/core/internals.py in get(self, item, fastpath) 
    3541 
    3542    if not isnull(item): 
-> 3543     loc = self.items.get_loc(item) 
    3544    else: 
    3545     indexer = np.arange(len(self.items))[isnull(self.items)] 

/usr/local/lib/python3.4/dist-packages/pandas/indexes/base.py in get_loc(self, key, method, tolerance) 
    2134     return self._engine.get_loc(key) 
    2135    except KeyError: 
-> 2136     return self._engine.get_loc(self._maybe_cast_indexer(key)) 
    2137 
    2138   indexer = self.get_indexer([key], method=method, tolerance=tolerance) 

pandas/index.pyx in pandas.index.IndexEngine.get_loc (pandas/index.c:4145)() 

pandas/index.pyx in pandas.index.IndexEngine.get_loc (pandas/index.c:4009)() 

pandas/src/hashtable_class_helper.pxi in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:13166)() 

pandas/src/hashtable_class_helper.pxi in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:13120)() 

KeyError: 0 

就如何解決這個問題的任何建議?非常感謝!

回答

2

試試這個:

In [358]: df.drop('date',1).groupby('name').agg(lambda x: abs(x.iloc[1] - x.iloc[0])) 
Out[358]: 
     A_score B_score 
name 
Andy  1  8 
John  5  2