2014-02-13 93 views
4

API documentation,我無法找到DataFrameGroupBy的課程方法。我想知道我是否在錯誤的地方尋找。API ref缺少DataFrameGroupBy對象?

the guide,這些對象有以下幾種方法:

In [22]: gb = df.groupby('gender') 
In [23]: gb.<TAB> 
gb.agg  gb.boxplot gb.cummin  gb.describe gb.filter  gb.get_group gb.height  gb.last  gb.median  gb.ngroups gb.plot  gb.rank  gb.std  gb.transform 
gb.aggregate gb.count  gb.cumprod gb.dtype  gb.first  gb.groups  gb.hist  gb.max  gb.min  gb.nth  gb.prod  gb.resample gb.sum  gb.var 
gb.apply  gb.cummax  gb.cumsum  gb.fillna  gb.gender  gb.head  gb.indices gb.mean  gb.name  gb.ohlc  gb.quantile gb.size  gb.tail  gb.weight 

其中CNA我找他們做什麼解釋?

+1

中看到這些。這些文件中的一部分記錄在[此處](http://pandas.pydata.org/pandas-docs/stable/api。 html#groupby),但有些缺失。然而,它看起來很多缺失的都很清楚,因爲它們只是對每個組應用一個相同名稱的數學函數(例如'cummin')。 – BrenBarn

+2

這些最終只是調用相同的名爲DataFrame方法(或爲groupby版本優化)(從特定的方法''transform/apply/agg/groups'') – Jeff

+0

@Jeff我認爲這裏也有一個錯誤:沒有文檔字符串可用。 –

回答

5

找出一個功能是什麼,最簡單的方法是諮詢文檔字符串:

In [24]: gb.filter? # help(gb.filter) in python interpreter 
Type:  instancemethod 
String Form:<bound method DataFrameGroupBy.filter of <pandas.core.groupby.DataFrameGroupBy object at 0x1046ad290>> 
File:  /Users/andy/pandas/pandas/core/groupby.py 
Definition: g.filter(self, func, dropna=True, *args, **kwargs) 
Docstring: 
Return a copy of a DataFrame excluding elements from groups that 
do not satisfy the boolean criterion specified by func. 

Parameters 
---------- 
f : function 
    Function to apply to each subframe. Should return True or False. 
dropna : Drop groups that do not pass the filter. True by default; 
    if False, groups that evaluate False are filled with NaNs. 

Notes 
----- 
Each subframe is endowed the attribute 'name' in case you need to know 
which group you are working on. 

Example 
-------- 
>>> grouped = df.groupby(lambda x: mapping[x]) 
>>> grouped.filter(lambda x: x['A'].sum() + x['B'].sum() > 0) 

但是有a bug的「落空」的方法不出示有效的文檔字符串,而是隻顯示包裝爲他們調用的DataFrame方法。例如,gb.cummin(*args, **kwargs)相當於gb.apply(lambda x: x.cummin(*args, **kwargs))

In [31]: gb.cummin? 
Type:  function 
String Form:<function wrapper at 0x1046a9410> 
File:  /Users/andy/pandas/pandas/core/groupby.py 
Definition: g.cummin(*args, **kwargs) 
Docstring: <no docstring> 

In [32]: df.cummin? 
Type:  instancemethod 
String Form: 
<bound method DataFrame.min of a b 
0 1 2 

[1 rows x 2 columns]> 
File:  /Users/andy/pandas/pandas/core/generic.py 
Definition: df.cummin(self, axis=None, dtype=None, out=None, skipna=True, **kwargs) 
Docstring: 
Return cumulative min over requested axis. 

Parameters 
---------- 
axis : {index (0), columns (1)} 
skipna : boolean, default True 
    Exclude NA/null values. If an entire row/column is NA, the result 
    will be NA 

Returns 
------- 
min : Series 

舉一個例子來解釋這個特殊的方法,並證明了等價:

In [41]: df = pd.DataFrame([[2, 4], [1, 5], [2, 2], [1, 3]], columns=['a', 'b']) 

In [42]: df 
Out[42]: 
    a b 
0 2 4 
1 1 5 
2 2 2 
3 1 3 

In [43]: gb = df.groupby('a') 

In [44]: gb.cummin() 
Out[44]: 
    a b 
0 2 4 
1 1 5 
2 2 2 
3 1 3 

In [45]: gb.apply(lambda x: x.cummin()) 
Out[45]: 
    a b 
0 2 4 
1 1 5 
2 2 2 
3 1 3 

注:我認爲有相當多唾手可得的這裏(使這些GROUPBY功能的更多高效的,以及添加文檔字符串),我們很可能會在0.14 ...