如何有條件地在熊貓數據框中做一個vlookup

我想弄清楚如何做一個vlookup來挑選出最新的價格來填補第二張表。下面的例子。對於項目＃1，最新價格在月份6 (=$6)，而項目＃2在月份5 (=$4)。填表B的最佳方式是什麼？注意：如果該項目是新的，則有可能在表A中找不到item_id。如何有條件地在熊貓數據框中做一個vlookup

任何指導？非常感謝。

表A（參考）

| Item_ID | Month | Price | 
|---------|-------|-------| 
| 1  | 4  | 10 | 
| 1  | 5  | 8  | 
| 1  | 6  | 6  | 
| 2  | 5  | 4  |

表B（填充）

| Shop_ID | Item_ID | Price | 
|---------|---------|-------| 
| 1  | 1  | 6  | 
| 1  | 2  | 4  |

來源

2017-11-25 Harris

你可以先找到最新的信息，然後將其合併到創建表：

import pandas 


tableA = pandas.DataFrame({'Item_ID': {0: 1, 1: 1, 2: 1, 3: 2}, 
          'Month': {0: 4, 1: 5, 2: 6, 3: 5}, 
          'Price': {0: 10, 1: 8, 2: 6, 3: 4}}) 
tableB = pandas.DataFrame({'Item_ID': {0: 1, 1: 2}, 
          'Price': {0: 6, 1: 4}, 
          'Shop_ID': {0: 1, 1: 1}}) 

latest = tableA.loc[tableA.groupby('Item_ID')['Month'].idxmax()] 
result = tableB[['Shop_ID', 'Item_ID']].merge(latest[['Item_ID', 'Price']], 
               on='Item_ID')

這就產生

 Shop_ID Item_ID Price 
0  1  1  6 
1  1  2  4

來源

2017-11-25 12:47:52 chthonicdaemon

要df2填寫列Price我們可以創建一個熊貓系列的Item_ID和價格。對於每個Item_ID，使用drop_duplicates作爲最後一行，並通過set_index創建Series並選擇列。最後用map創建新列。

完整的示例：

import pandas as pd 

# Sample data 
data1 = dict(Item_ID=[1,1,1,2], Month=[4,5,6,5], Price = [10,8,6,4]) 
data2 = dict(Shop_ID=[1,1],Item_ID=[1,2]) 

# Create dfs 
df1 = pd.DataFrame(data1) 
df2 = pd.DataFrame(data2) 

# Crete a series with Item_ID as index and Price as value 
s = df1.drop_duplicates('Item_ID', keep='last').set_index('Item_ID')['Price'] 

# Create new column in df2 
df2['Price'] = df2['Item_ID'].map(s) 
print (df2)

Shop_ID Item_ID Price 
0  1  1  6 
1  1  2  4

更多詳情

如果需要使用sort_values第一

s = (df1.sort_values(['Item_ID','Month']) 
     .drop_duplicates('Item_ID', keep='last') 
     .set_index('Item_ID')['Price'])

意甲s看起來是這樣的：

Item_ID 
1 6 
2 4 
Name: Price, dtype: int64

來源

2017-11-25 12:27:32 jezrael

這個答案很好，冒昧添加一些數據並且扔東西+1 –

@AntonvBR - 非常感謝。 – jezrael

如何有條件地在熊貓數據框中做一個vlookup

回答

相關問題