假設我們有以下的數據幀:如何從應用中正確返回格式化的熊貓數據框?
import pandas as pd
import numpy as np
years = [2005, 2006]
location = ['city', 'suburb']
dft = pd.DataFrame({
'year': [years[np.random.randint(0, 1+1)] for _ in range(100)],
'location': [location[np.random.randint(0, 1+1)] for _ in range(100)],
'days_to_complete': np.random.randint(100, high=600, size=100),
'cost_in_millions': np.random.randint(1, high=10, size=100)
})
GROUPBY年和位置,然後將一個函數如下所示:
def get_custom_summary(group):
gt_200 = group.days_to_complete > 200
lt_200 = group.days_to_complete < 200
avg_days_gt200 = group[gt_200].days_to_complete.mean()
avg_cost_gt200 = group[gt_200].cost_in_millions.mean()
avg_days_lt200 = group[lt_200].days_to_complete.mean()
avg_cost_lt200 = group[lt_200].cost_in_millions.mean()
lt_200_prop = lt_200.sum()/(gt_200.sum() + lt_200.sum())
return pd.DataFrame({
'gt_200': {'AVG_DAYS': avg_days_gt200, 'AVG_COST': avg_cost_gt200},
'lt_200': {'avg_days': avg_days_lt200, 'avg_cost': avg_cost_lt200},
'lt_200_prop' : lt_200_prop
})
result = dft.groupby(['year', 'location']).apply(get_custom_summary)
調用拆散(2)的結果,我們得到以下的輸出:
print(result.unstack(2))
gt_200 lt_200 lt_200_prop
AVG_COST AVG_DAYS avg_cost avg_days AVG_COST AVG_DAYS avg_cost avg_days AVG_COST AVG_DAYS avg_cost avg_days
year location
2005 city 4.818182 415.636364 NaN NaN NaN NaN 7.250000 165.50 0.153846 0.153846 0.153846 0.153846
suburb 5.631579 336.631579 NaN NaN NaN NaN 5.166667 140.50 0.240000 0.240000 0.240000 0.240000
2006 city 4.130435 396.913043 NaN NaN NaN NaN 5.750000 150.75 0.258065 0.258065 0.258065 0.258065
suburb 5.294118 392.823529 NaN NaN NaN NaN 1.000000 128.00 0.055556 0.055556 0.055556 0.055556
對於列gt_200
和lt_200
到dropna(axis=1)
通話將祛瘀e填充了NaN的列,但lt_200_prop
列仍然卡住了錯誤的列名稱。我怎樣才能從get_custom_summary返回一個DataFrame到列(gt_200
,lt_200
,lt_200_prop
)?(沒有廣播(如果這是正確的話)子列(AVG_COST
,AVG_DAYS
,avg_cost
,avg_days
)?
編輯:
所需的輸出:
gt_200 lt_200 lt_200_prop
AVG_COST AVG_DAYS avg_cost avg_days
year location
2005 city 4.818182 415.636364 7.250000 165.50 0.153846
suburb 5.631579 336.631579 5.166667 140.50 0.240000
2006 city 4.130435 396.913043 5.750000 150.75 0.258065
suburb 5.294118 392.823529 1.000000 128.00 0.055556
才能添加所需的輸出? – jezrael
@jezrael剛剛添加了所需的輸出。 – Jay