如何根據python pandas.Dataframe中的列表分配標籤？

我有兩個DataFrame，一個是'配方'，成分的組合，另一個是'喜歡'，它包含了流行的組合。如何根據python pandas.Dataframe中的列表分配標籤？

recipe = pd.DataFrame({'A': ['chicken','beef','pork','egg', 'chicken', 'egg', 'beef'], 
         'B': ['sweet', 'hot', 'salty', 'hot', 'sweet', 'salty', 'hot']}) 
recipe 
    A  B 
0 chicken sweet 
1  beef hot 
2  pork salty 
3  egg hot 
4 chicken sweet 
5  egg salty 
6  beef hot 

like = pd.DataFrame({'A':['beef', 'egg'], 'B':['hot', 'salty']}) 
like 
    A  B 
0 beef hot 
1 egg salty

我怎麼能一列「C」添加到配方，如果組合中的「喜歡」上市，那麼我給它值「yes」，否則「不」？

我想要的結果是

recipe 
     A  B C 
0 chicken sweet no 
1  beef hot yes 
2  pork salty no 
3  egg hot no 
4 chicken sweet no 
5  egg salty yes 
6  beef hot yes

問題是我的兩個dataframes大。我無法手動選擇「喜歡」中的項目並在「食譜」中分配「是」標籤。有沒有簡單的方法可以做到這一點？

來源

2016-03-24 xirururu

可以將'A'中的項目作爲'牛肉'，例如'B'中的'鹹'，導致不匹配？ – Leb

@Leb，如果'A'是'牛肉'，'B'是'鹹味'，那麼我會分配標籤'no'。沒有不匹配會發生。 – xirururu

您可以使用merge和numpy.where：

df = pd.merge(recipe, like, on=['A','B'], indicator=True, how='left') 
print df 
     A  B  _merge 
0 chicken sweet left_only 
1  beef hot  both 
2  pork salty left_only 
3  egg hot left_only 
4 chicken sweet left_only 
5  egg salty  both 
6  beef hot  both 

df['C'] = np.where(df['_merge'] == 'both', 'yes', 'no') 

print df[['A','B','C']] 
     A  B C 
0 chicken sweet no 
1  beef hot yes 
2  pork salty no 
3  egg hot no 
4 chicken sweet no 
5  egg salty yes 
6  beef hot yes

更快的是使用df['_merge'] == 'both'：

In [460]: %timeit np.where(np.in1d(df['_merge'],'both'), 'yes', 'no') 
100 loops, best of 3: 2.22 ms per loop 

In [461]: %timeit np.where(df['_merge'] == 'both', 'yes', 'no') 
1000 loops, best of 3: 652 µs per loop

來源

2016-03-24 12:05:49 jezrael

謝謝！我一直在想'concat'和'join'。但沒有找到解決方案。「合併」就是答案。：D – xirururu

你可以到like的'yes'個C列添加，然後用like合併recipe。匹配的行將在C列中有yes，沒有匹配的行將有NaN s。然後，您可以使用fillna與'no' s到更換的NaN：

import pandas as pd 
recipe = pd.DataFrame({'A': ['chicken','beef','pork','egg', 'chicken', 'egg', 'beef'], 
         'B': ['sweet', 'hot', 'salty', 'hot', 'sweet', 'salty', 'hot']}) 

like = pd.DataFrame({'A':['beef', 'egg'], 'B':['hot', 'salty']}) 
like['C'] = 'yes' 
result = pd.merge(recipe, like, how='left').fillna('no') 
print(result)

產生

  A  B C 
0 chicken sweet no 
1  beef hot yes 
2  pork salty no 
3  egg hot no 
4 chicken sweet no 
5  egg salty yes 
6  beef hot yes

來源

2016-03-24 12:23:30 unutbu

你可以通過匹配兩者A和B這樣使用set_value：

recipe.set_value(recipe[recipe.A.isin(like.A) & recipe.B.isin(like.B)].index,'C','yes') 
recipe.fillna('no')

哪樣給你：

  A  B C 
0 chicken sweet no 
1  beef hot yes 
2  pork salty no 
3  egg hot yes 
4 chicken sweet no 
5  egg salty yes 
6  beef hot yes

注意：這些結果並不意味着我的答案比其他答案好，反之亦然。

使用set_value：

%timeit recipe.set_value(recipe[recipe.A.isin(like.A) & recipe.B.isin(like.B)].index,'C','yes'); recipe.fillna('no') 
100 loops, best of 3: 2.69 ms per loop

使用merge和創造新的df：

%timeit df = pd.merge(recipe, like, on=['A','B'], indicator=True, how='left'); df['C'] = np.where(df['_merge'] == 'both', 'yes', 'no') 
100 loops, best of 3: 8.42 ms per loop

使用merge只：

%timeit df['C'] = np.where(df['_merge'] == 'both', 'yes', 'no') 
1000 loops, best of 3: 187 µs per loop

再次，它真的取決於你什麼時間。請謹慎複製您的數據。

來源

2016-03-24 12:26:06 Leb

set_value不是真正的慣用;只是使用任務。配方['C'] = ... – Jeff

我想，這段代碼也很聰明。但是，您認爲速度與「merge」相比怎麼樣？ – xirururu

@xirururu閱讀我的編輯，這一切都取決於你可能重複的部分。 – Leb

如何根據python pandas.Dataframe中的列表分配標籤？

回答

相關問題