1
我有SQL statment這樣的:Python的大熊貓:在AGG功能case語句
select id
, avg(case when rate=1 then rate end) as "P_Rate"
, stddev(case when rate=1 then rate end) as "std P_Rate",
, avg(case when f_rate = 1 then f_rate else 0 end) as "A_Rate"
, stddev(case when f_rate = 1 then f_rate else 0 end) as "std A_Rate"
from (
select id, connected_date,payment_type,acc_type,
max(case when is s_rate > 1 then 1 else 0 end)/count(open) as rate
sum(case when is hire_days <= 5 and paid>1000 then 1 else 0 end)/count(open) as f_rate
from analysis_table where alloc_date <= '2016-01-01' group by 1,2
) a group by id
我試圖用熊貓改寫: 起初我將創建 「內部」 表數據框:
filtered_data = data.where(data['alloc_date'] <= analysis_date)
然後我就這組數據
grouped = filtered_data.groupby(['id','connected_date'])
但我必須使用用於過濾每一列使用最大/總和就可以了。
我想是這樣的:
`def my_agg_function(hire_days,paid,open):
r_arr = []
if hire_days <= 5 and paid > 1000:
r_arr.append(1)
else:
r.append(0)
return np.max(r_arr)/len(????)
inner_table['f_rate'] = grouped.agg(lambda row: my_agg_function(row['hire_days'],row['paid'],row['open'])`
和速度
好吧讓我們看看點擊次數看起來像(.023,1.2,0.4,2.4,2.1,.1,2),並且U想要計算總和但不像(.023 +1,2等),但是如果number_of_clicks <1 then 0 else 1 and after this calculation sum(1 + 1 + 0 + 1 ..) – gostin
然後在groupby之前做類似下面的事情:'df ['number_of_clicks'] = df ['number_of_clicks']> = 1' 。你會得到boolean的'Series'(它也是0和1到python),groupby中的和會給你你想要的。 – ysearka