映射列到另一個來創建一個新的列

我有一個數據幀映射列到另一個來創建一個新的列

id store address 
1 100  xyz 
2 200  qwe 
3 300  asd 
4 400  zxc 
5 500  bnm

我還有一個數據幀DF2

serialNo store_code warehouse 
    1   300   Land 
    2   500   Sea 
    3   100   Land 
    4   200   Sea 
    5   400   Land

我希望我的最後數據幀的樣子：

id store address warehouse 
1 100  xyz  Land 
2 200  qwe  Sea 
3 300  asd  Land 
4 400  zxc  Land 
5 500  bnm  Sea

即從一個數據幀到另一個數據幀的映射創建新列

來源

2017-09-05 Shubham

選項1

使用df.merge

out = df1.merge(df2, left_on='store', right_on='store_code')\ 
         [['id', 'store', 'address', 'warehouse']] 
print(out) 

    id store address warehouse 
0 1 100  xyz  Land 
1 2 200  qwe  Sea 
2 3 300  asd  Land 
3 4 400  zxc  Land 
4 5 500  bnm  Sea

選項2

使用pd.concat和df.sort_values

out = pd.concat([df1.sort_values('store'),\ 
     df2.sort_values('store_code')[['warehouse']].reset_index(drop=1)], 1) 
print(out) 

    id store address warehouse 
0 1 100  xyz  Land 
1 2 200  qwe  Sea 
2 3 300  asd  Land 
3 4 400  zxc  Land 
4 5 500  bnm  Sea

第一次排序通話冗餘假設你的數據幀已經排序上store，在這種情況下，你可以將其刪除。

選項3

使用df.replace

s = df1.store.replace(df2.set_index('store_code')['warehouse']) 
print(s) 
0 Land 
1  Sea 
2 Land 
3 Land 
4  Sea 

df1['warehouse'] = s 
print(df1) 

    id store address warehouse 
0 1 100  xyz  Land 
1 2 200  qwe  Sea 
2 3 300  asd  Land 
3 4 400  zxc  Land 
4 5 500  bnm  Sea

可替換地，顯式地創建的映射。如果您稍後想使用它，這將起作用。

mapping = dict(df2[['store_code', 'warehouse']].values) # separate step 
df1['warehouse'] = df1.store.replace(mapping) # df1.store.map(mapping) 
print(df1) 

    id store address warehouse 
0 1 100  xyz  Land 
1 2 200  qwe  Sea 
2 3 300  asd  Land 
3 4 400  zxc  Land 
4 5 500  bnm  Sea

來源

2017-09-05 08:04:57

使用map或join：

df1['warehouse'] = df1['store'].map(df2.set_index('store_code')['warehouse']) 
print (df1) 
    id store address warehouse 
0 1 100  xyz  Land 
1 2 200  qwe  Sea 
2 3 300  asd  Land 
3 4 400  zxc  Land 
4 5 500  bnm  Sea

df1 = df1.join(df2.set_index('store_code'), on=['store']).drop('serialNo', 1) 
print (df1) 
    id store address warehouse 
0 1 100  xyz  Land 
1 2 200  qwe  Sea 
2 3 300  asd  Land 
3 4 400  zxc  Land 
4 5 500  bnm  Sea

來源

2017-09-05 07:55:39 jezrael

我在類似數據集中運行.map代碼時出現此錯誤。 'Reindexing只對唯一有價值的索引對象有效' – Shubham

我認爲在'df2'的'store_code'中有重複的問題。所以需要'df1 ['store']。map（df2.drop_duplicates（'store_code'）。set_index（'store_code'）['warehouse']）' – jezrael

正確！謝謝：） – Shubham

映射列到另一個來創建一個新的列

回答

相關問題