2017-12-27 1277 views
0

上的日期時間列的最後N值的列使用聚合函數我有一個包含體育博彩數據的數據幀:match_id,TEAM_ID,goals_scored和比賽開始的時間日期時間列。我想將列添加到這個數據幀,對於每行顯示的各隊打進前一個n個匹配的目標總和。大熊貓 - 在同一個數據幀

+2

你能提供的樣本數據,你想要什麼,輸出什麼樣子的?您的描述是明確的,但它更容易,當我們有一些幫助建立一個答案。 –

回答

1

我編寫了一些模擬數據,因爲我喜歡足球,但像Jacob H建議最好總是提供一個樣本數據框與問題。

import pandas as pd 
import numpy as np 
np.random.seed(2) 

d = {'match_id': np.arange(10) 
     ,'team_id': ['City','City','City','Utd','Utd','Utd','Albion','Albion','Albion','Albion'] 
     ,'goals_scored': np.random.randint(0,5,10) 
     ,'time_played': [0,1,2,0,1,2,0,1,2,3]} 
df = pd.DataFrame(data=d) 

#previous n matches 
n=2 

#some Saturday 3pm kickoffs. 
rng = pd.date_range('2017-12-02 15:00:00','2017-12-25 15:00:00',freq='W') 

# change the time_played integers to the datetimes 
df['time_played'] = df['time_played'].map(lambda x: rng[x]) 

#be sure the sort order is correct 
df = df.sort_values(['team_id','time_played']) 

# a rolling sum() and then shift(1) to align value with row as per question 
df['total_goals'] = df.groupby(['team_id'])['goals_scored'].apply(lambda x: x.rolling(n).sum()) 
df['total_goals'] = df.groupby(['team_id'])['total_goals'].shift(1) 

主要生產:

goals_scored match_id team_id   time_played total_goals->(in previous n) 
6    2   6 Albion 2017-12-03 15:00:00   NaN 
7    1   7 Albion 2017-12-10 15:00:00   NaN 
8    3   8 Albion 2017-12-17 15:00:00   3.0 
9    2   9 Albion 2017-12-24 15:00:00   4.0 
0    0   0 City 2017-12-03 15:00:00   NaN 
1    0   1 City 2017-12-10 15:00:00   NaN 
2    3   2 City 2017-12-17 15:00:00   0.0 
3    2   3  Utd 2017-12-03 15:00:00   NaN 
4    3   4  Utd 2017-12-10 15:00:00   NaN 
5    0   5  Utd 2017-12-17 15:00:00   5.0 
+0

這是完美的! – L1meta

1

有可能是一個更有效的方式與聚合函數要做到這一點,但這裏的地方,每個條目,你篩選你的整個數據幀以隔離團隊和日期範圍,然後求和目標的解決方案。

df['goals_to_date'] = df.apply(lambda row: np.sum(df[(df['team_id'] == row['team_id'])\ 
    &(df['datetime'] < row['datetime'])]['goals_scored']), axis = 1)