2014-09-19 41 views
0

這裏是我的數據樣本:Python/Pandas:如何選擇i列等於不同列中不同行的列中的行?

In[177]:df_data[['Date', 'TeamName', 'Opponent', 'ScoreOff']].head() 
Out[177]: 
        Date    TeamName    Opponent ScoreOff 
4128 2005-09-08 00:00:00 New England Patriots  Oakland Raiders 30 
4129 2005-09-08 00:00:00  Oakland Raiders New England Patriots 20 
4130 2005-09-11 00:00:00  Arizona Cardinals  New York Giants 19 
4131 2005-09-11 00:00:00  Baltimore Ravens Indianapolis Colts 7 
4132 2005-09-11 00:00:00   Buffalo Bills  Houston Texans 22 

對於每一行,我需要設置一個新列[「OpponentScoreOff」]等於球隊的對手的ScoreOff的那一天。

我已經完成了基本上做了以下,但它很慢,我覺得有一個更pythonic /矢量化的方式來做到這一點。

g1 = df_data.groupby('Date') 
for date, teams in g1: 
    g2 = teams.groupby('TeamName') 
    for teamname, game in teams: 
     df_data[(df_data['TeamName'] == teamname) & (dfdata['Date'] == date)]['OppScoreOff'] =  df_data[(df_data['Opponent'] == teamname) & (df_data['Date'] == date)]['ScoreOff'] 

它工作,但它很慢。任何更好的方式來做到這一點?

回答

0

您可以使用sort在任何給定日期利用TeamName和Opponent之間的雙射。考慮以下幾點:

import pandas as pd 
import numpy as np 

df_data = df_data.sort(['Date', 'TeamName']) 
opp_score = np.array(df_data.sort(['Date', 'Opponent'])['ScoreOff']) 
df_data['OpponentScoreOff'] = opp_score 

數組調用是必要的,以移除DataFrame索引。這樣,數組一旦被放回df_data就不會被使用。