找出一個功能是什麼,最簡單的方法是諮詢文檔字符串:
In [24]: gb.filter? # help(gb.filter) in python interpreter
Type: instancemethod
String Form:<bound method DataFrameGroupBy.filter of <pandas.core.groupby.DataFrameGroupBy object at 0x1046ad290>>
File: /Users/andy/pandas/pandas/core/groupby.py
Definition: g.filter(self, func, dropna=True, *args, **kwargs)
Docstring:
Return a copy of a DataFrame excluding elements from groups that
do not satisfy the boolean criterion specified by func.
Parameters
----------
f : function
Function to apply to each subframe. Should return True or False.
dropna : Drop groups that do not pass the filter. True by default;
if False, groups that evaluate False are filled with NaNs.
Notes
-----
Each subframe is endowed the attribute 'name' in case you need to know
which group you are working on.
Example
--------
>>> grouped = df.groupby(lambda x: mapping[x])
>>> grouped.filter(lambda x: x['A'].sum() + x['B'].sum() > 0)
但是有a bug的「落空」的方法不出示有效的文檔字符串,而是隻顯示包裝爲他們調用的DataFrame方法。例如,gb.cummin(*args, **kwargs)
相當於gb.apply(lambda x: x.cummin(*args, **kwargs))
。
In [31]: gb.cummin?
Type: function
String Form:<function wrapper at 0x1046a9410>
File: /Users/andy/pandas/pandas/core/groupby.py
Definition: g.cummin(*args, **kwargs)
Docstring: <no docstring>
In [32]: df.cummin?
Type: instancemethod
String Form:
<bound method DataFrame.min of a b
0 1 2
[1 rows x 2 columns]>
File: /Users/andy/pandas/pandas/core/generic.py
Definition: df.cummin(self, axis=None, dtype=None, out=None, skipna=True, **kwargs)
Docstring:
Return cumulative min over requested axis.
Parameters
----------
axis : {index (0), columns (1)}
skipna : boolean, default True
Exclude NA/null values. If an entire row/column is NA, the result
will be NA
Returns
-------
min : Series
舉一個例子來解釋這個特殊的方法,並證明了等價:
In [41]: df = pd.DataFrame([[2, 4], [1, 5], [2, 2], [1, 3]], columns=['a', 'b'])
In [42]: df
Out[42]:
a b
0 2 4
1 1 5
2 2 2
3 1 3
In [43]: gb = df.groupby('a')
In [44]: gb.cummin()
Out[44]:
a b
0 2 4
1 1 5
2 2 2
3 1 3
In [45]: gb.apply(lambda x: x.cummin())
Out[45]:
a b
0 2 4
1 1 5
2 2 2
3 1 3
注:我認爲有相當多唾手可得的這裏(使這些GROUPBY功能的更多高效的,以及添加文檔字符串),我們很可能會在0.14 ...
中看到這些。這些文件中的一部分記錄在[此處](http://pandas.pydata.org/pandas-docs/stable/api。 html#groupby),但有些缺失。然而,它看起來很多缺失的都很清楚,因爲它們只是對每個組應用一個相同名稱的數學函數(例如'cummin')。 – BrenBarn
這些最終只是調用相同的名爲DataFrame方法(或爲groupby版本優化)(從特定的方法''transform/apply/agg/groups'') – Jeff
@Jeff我認爲這裏也有一個錯誤:沒有文檔字符串可用。 –