2014-03-03 132 views
1
time_period total_cost total_revenue 
7days   150   250 
14days   350   600 
30days   900   750 
7days   180   400 
14days   430   620 

鑑於此數據,我想將total_cost和total_revenue列轉換爲給定時間段的平均值。我認爲這會工作:有條件地執行大熊貓數據框的計算

df[['total_cost','total_revenue']][df.time_period]=="7days"]=df[['total_cost','total_revenue']][df.time_period]=="7days"]/7 

但它返回數據幀不變。

回答

3

我相信你正在操作數據框的副本。我認爲你應該使用apply

from StringIO import StringIO 
import pandas 
datastring = StringIO("""\ 
time_period total_cost total_revenue 
7days   150   250 
14days   350   600 
30days   900   750 
7days   180   400 
14days   430   620 
""") 

data = pandas.read_table(datastring, sep='\s\s+') 

data['total_cost_avg'] = data.apply(
    lambda row: row['total_cost']/float(row['time_period'][:-4]), 
    axis=1 
) 

給我:

time_period total_cost total_revenue total_cost_avg 
0  7days   150   250  21.428571 
1  14days   350   600  25.000000 
2  30days   900   750  30.000000 
3  7days   180   400  25.714286 
4  14days   430   620  30.714286 
+0

你也可以使用str.extract提取日子:)有點感覺應該是做一個timedelta的方法:s –

2

保羅出色答卷。在這裏添加我的方法

test_df = pd.read_csv("file1.csv") 
test_df 

    time_period  total_cost total_revenue 
0 7days   150  250 
1 14days   350  600 
2 30days   900  750 
3 7days   180  400 
4 14days   430  620 

test_df['days'] = test_df.time_period.str.extract('(\d*)days').apply(int) 
test_df['total_cost'] = test_df.total_cost/test_df.days 
test_df['total_revenue'] = test_df.total_revenue/test_df.days 
del test_df['days'] 
test_df 


    time_period total_cost  total_revenue 
0 7days  21.428571   35.714286 
1 14days  25.000000   42.857143 
2 30days  30.000000   25.000000 
3 7days  25.714286   57.142857 
4 14days  30.714286   44.285714