搜索在熊貓數據幀

我必須爲空兩個dataframes：搜索在熊貓數據幀

import pandas as pd 
raw_data = { 
     'employee_id': ['4444', '5555', '6666','7777','8888'], 
     'first_name': ['aa', 'Jason', 'Tina', 'Jake', 'Amy'], 
     'last_name': ['Miller', 'Millers', 'Ali', 'Milner', 'Cooze'], 
     'age': [42, 42, 36, 24, 73], 
} 
df1 = pd.DataFrame(raw_data, columns = ['employee_id','first_name', 'last_name', 'age']) 


raw_data1 = {'employee_id': ['4444', '5555', '6666','7777'], 
    'ip': ['192.168.1.101', '192.168.1.102','192.168.1.103','192.168.1.104'], 

} 

df2 = pd.DataFrame(raw_data1, columns = ['employee_id', 'ip'])

我要搜索（比較）在DF1 df2['employee_id']，如果值是相同的，加df2['ip']爲DF1：

print df2['ip'].where(df2['employee_id']==df1['employee_id'])

但這不是正確的方法：

ValueError: Can only compare identically-labeled Series objects

對此問題的任何建議w不勝感激。

來源

2017-08-08 jojo

這是一個更新的答案被刪除的人用了我認爲是更好的解決方案後。

on = "employee_id" 
df3 = df1.set_index(on).join(df2.set_index(on)).fillna("IP missing") 
df3["ip"].to_dict() 

employee_id first_name last_name age ip   
4444  aa   Miller  42 192.168.1.101 
5555  Jason  Millers  42 192.168.1.102 
6666  Tina  Ali   36 192.168.1.103 
7777  Jake  Milner  24 192.168.1.104 
8888  Amy   Cooze  73 IP missing 

{'4444': '192.168.1.101', 
'5555': '192.168.1.102', 
'6666': '192.168.1.103', 
'7777': '192.168.1.104', 
'8888': 'IP missing'}

以前的答案：

pd.merge(df1,df2,on="employee_id")

https://pandas.pydata.org/pandas-docs/stable/merging.html#database-style-dataframe-joining-merging

pd.merge（左，右，如何= '內部'，上=無，left_on =無， right_on =無， left_index = False，right_index = False，sort = True， suffixes =（'_ x'，'_y'），copy = True，indicator = False）

給人

employee_id first_name last_name age ip 
0 4444 aa  Miller 42 192.168.1.101 
1 5555 Jason Millers 42 192.168.1.102 
2 6666 Tina Ali 36 192.168.1.103 
3 7777 Jake Milner 24 192.168.1.104

，可能你想是這樣的：

pd.merge(df1,df2,on="employee_id").set_index("employee_id")["ip"].to_dict() 

{'4444': '192.168.1.101', 
'5555': '192.168.1.102', 
'6666': '192.168.1.103', 
'7777': '192.168.1.104'}

來源

2017-08-08 20:33:13

如果我想向df1添加匹配值，那麼怎麼辦：添加一列並插入匹配的ip，不匹配將是空的。謝謝 – jojo

@jojo在此重新分配它：on =「employee_id」，df1 = df1.set_index（on）.join（df2.set_index（on））。reset_index（） –

您的數據科學知識非常棒。你能否建議一些書籍或視頻教程？我是Python開發人員，但在數據科學領域是全新的。謝謝 – jojo

使用merge

In [1286]: df1.merge(df2, on='employee_id') 
Out[1286]: 
    employee_id first_name last_name age    ip 
0  4444   aa Miller 42 192.168.1.101 
1  5555  Jason Millers 42 192.168.1.102 
2  6666  Tina  Ali 36 192.168.1.103 
3  7777  Jake Milner 24 192.168.1.104

來源

2017-08-08 20:31:55 Zero

我想了一段時間，你是如何複製到獲取輸出，所以很好地格式化？ –

我主要是Jupyter，你知道那裏的詭計嗎？步驟是什麼？無論如何，你得到我的upvote +1。 –

@AntonvBR，它是'iPython' - 它自動爲我們做... – MaxU

搜索在熊貓數據幀

回答

相關問題