2016-06-18 130 views
1

不同的列名我有2只熊貓dataframes df1df2查找值與大熊貓

Name No 
A  1 
A  2 
B  5 

Player Gender 
A  F 
B  M 
C  F 

我想在df1數據幀來創建一個新列sex,在df2使用從列gender相應的值。用於查找的列是Name,df1Playerdf2

真的很感謝所有幫助

回答

1

使用map通過df2其中是Playerset_index

df1['sex'] = df1.Name.map(df2.set_index('Player')['Gender']) 
print (df1) 
    Name No sex 
0 A 1 F 
1 A 2 F 
2 B 5 M 

這是一樣的mapdict

d = df2.set_index('Player')['Gender'].to_dict() 
print (d) 
{'A': 'F', 'B': 'M', 'C': 'F'} 
df1['sex'] = df1.Name.map(d) 
print (df1) 
    Name No sex 
0 A 1 F 
1 A 2 F 
2 B 5 M 

或者:

print (pd.merge(df1,df2, left_on='Name', right_on='Player') 
     .rename(columns={'Gender':'sex'}) 
     .drop('Player', axis=1)) 

    Name No sex 
0 A 1 F 
1 A 2 F 
2 B 5 M 

首先是來得更快:

In [46]: %timeit (pd.merge(df1,df2, left_on='Name', right_on='Player').rename(columns={'Gender':'sex'}).drop('Player', axis=1)) 
The slowest run took 4.53 times longer than the fastest. This could mean that an intermediate result is being cached. 
100 loops, best of 3: 2.53 ms per loop 

In [47]: %timeit df1.Name.map(df2.set_index('Player')['Gender']) 
The slowest run took 4.78 times longer than the fastest. This could mean that an intermediate result is being cached. 
1000 loops, best of 3: 882 µs per loop 
+0

感謝@jezrael。有沒有什麼辦法可以在df1中創建一個colum'sex',而不是合併2個數據框,因爲真正的df2我有很多列,所以我必須刪除相當多的未使用的列 – Square9627

+0

我認爲你可以使用map,請參閱edit 。 – jezrael