更好的方法來從大型字典中替換DataFrame中的值

我已經編寫了一些代碼，使用字典替換DataFrame中的值與來自另一個框架的值，並且它正在工作，但是我在一些大型文件上使用它，字典可以變得很長。幾千對。當我使用這段代碼時，它運行速度非常慢，並且在幾個時間段內它也一直在內存不足。更好的方法來從大型字典中替換DataFrame中的值

我有點相信我這樣做的方法遠非最佳，並且必須有一些更快的方法來做到這一點。我創建了一個簡單的示例，可以按照我的需要進行操作，但對於大量數據而言速度很慢。希望有人有一個更簡單的方法來做到這一點。

import pandas as pd 

#Frame with data where I want to replace the 'id' with the name from df2 
df1 = pd.DataFrame({'id' : [1, 2, 3, 4, 5, 3, 5, 9], 'values' : [12, 32, 42, 51, 23, 14, 111, 134]}) 

#Frame containing names linked to ids 
df2 = pd.DataFrame({'id' : [1, 2, 3, 4, 5, 6, 7, 8, 9, 10], 'name' : ['id1', 'id2', 'id3', 'id4', 'id5', 'id6', 'id7', 'id8', 'id9', 'id10']}) 

#My current "slow" way of doing this. 

#Starts by creating a dictionary from df2 
#Need to create dictionaries from the domain and banners tables to link ids 
df2_dict = dict(zip(df2['id'], df2['name'])) 

#and then uses the dict to replace the ids with name in df1 
df1.replace({'id' : df2_dict}, inplace=True)

來源

2016-11-10 Siesta

我認爲你可以使用map與Series轉換to_dict - 獲得NaN如果不是在df2存在價值：

df1['id'] = df1.id.map(df2.set_index('id')['name'].to_dict()) 
print (df1) 
    id values 
0 id1  12 
1 id2  32 
2 id3  42 
3 id4  51 
4 id5  23 
5 id3  14 
6 id5  111 
7 id9  134

或者replace，如果不要存在df2值讓從df1原始值：

df1['id'] = df1.id.replace(df2.set_index('id')['name']) 
print (df1) 
    id values 
0 id1  12 
1 id2  32 
2 id3  42 
3 id4  51 
4 id5  23 
5 id3  14 
6 id5  111 
7 id9  134

樣品：

#Frame with data where I want to replace the 'id' with the name from df2 
df1 = pd.DataFrame({'id' : [1, 2, 3, 4, 5, 3, 5, 9], 'values' : [12, 32, 42, 51, 23, 14, 111, 134]}) 
print (df1) 
#Frame containing names linked to ids 
df2 = pd.DataFrame({'id' : [1, 2, 3, 4, 6, 7, 8, 9, 10], 'name' : ['id1', 'id2', 'id3', 'id4', 'id6', 'id7', 'id8', 'id9', 'id10']}) 
print (df2) 

df1['new_map'] = df1.id.map(df2.set_index('id')['name'].to_dict()) 
df1['new_replace'] = df1.id.replace(df2.set_index('id')['name']) 
print (df1) 
    id values new_map new_replace 
0 1  12  id1   id1 
1 2  32  id2   id2 
2 3  42  id3   id3 
3 4  51  id4   id4 
4 5  23  NaN   5 
5 3  14  id3   id3 
6 5  111  NaN   5 
7 9  134  id9   id9

來源

2016-11-10 13:22:34 jezrael

這似乎是工作。但是有沒有辦法讓df1中的'values'列保留。我似乎無法弄清楚如何編寫這個只是改變id列和保留值列。 Nvm，只是想出了它。可以這樣做：df1 ['id']。replace（df2.set_index（'id'）['name']，inplace = True） – Siesta

對不起，我不添加asign，請參閱更新我的答案。 – jezrael

更好的方法來從大型字典中替換DataFrame中的值

回答

相關問題