我已經分組了數據框,我想根據這些值過濾每個組內的值。過濾組內大熊貓
我想:
figure_cols = list("ABC")
def get_threshold_for_IV(gr_vals):
return (gr_vals[figure_cols].max())/(gr_vals["A"].count())
def filter_IV(group):
A_tr, B_tr, C_tr = get_threshold_for_IV(group)
return group[(group.A >= A_tr) & (group.B >= B_tr) & (group.C >= C_tr)]
# 1 attempt
grouped.apply(filter_IV)
# 2 attempt
for name, group in grouped:
A_tr, B_tr, C_tr = get_threshold_for_IV(group)
group = group[(group.A < A_tr) & (group.B < B_tr) & (group.C < C_tr)]
但沒有任何工程。數據沒有改變。 我的功能正常。如果我在循環中插入print
,我可以看到過濾結果。
其他的事情我應該說,我想有過濾操作後,分組的對象做進一步的操作
我讀過的文檔,但它看起來像我看不見它。任何人都可以幫忙嗎?
編輯
新增自包含例如:
import numpy as np
import pandas as pd
df = pd.DataFrame({'gr' : ['foo', 'bar', 'foo', 'bar',
'foo', 'bar', 'foo', 'foo'],
'A' : np.arange(8),
'B' : np.random.randn(8),
'C' : np.random.randn(8)})
def filter_gt_3(group):
return group[group.A < 3]
grouped = df.groupby('gr')
for name, group in grouped:
print 'group name: %s' % name
print group
group = filter_gt_3(group)
print "\nfiltered"
print group
print '\n----------\n'
print 'Nothing filtered:\n'
for name, group in grouped:
print 'group name: %s' % name
print group
輸出
group name: bar
A B C gr
1 1 1.486028 -0.382597 bar
3 3 -0.501757 -0.771807 bar
5 5 -0.836930 -1.514824 bar
filtered
A B C gr
1 1 1.486028 -0.382597 bar
group name: foo
A B C gr
0 0 0.678104 -0.940245 foo
2 2 1.539903 1.460493 foo
4 4 -0.033421 -1.078566 foo
6 6 1.146298 0.039721 foo
7 7 1.095707 -1.032275 foo
filtered
A B C gr
0 0 0.678104 -0.940245 foo
2 2 1.539903 1.460493 foo
----------
Nothing filtered:
group name: bar
A B C gr
1 1 1.486028 -0.382597 bar
3 3 -0.501757 -0.771807 bar
5 5 -0.836930 -1.514824 bar
group name: foo
A B C gr
0 0 0.678104 -0.940245 foo
2 2 1.539903 1.460493 foo
4 4 -0.033421 -1.078566 foo
6 6 1.146298 0.039721 foo
7 7 1.095707 -1.032275 foo
您能否提供一個自包含的,可運行的示例來演示此問題? – BrenBarn
@BrenBarn我已經添加了示例 –