Python/pandas - 根據另一列中的單詞添加單詞

我正在使用帶熊貓的xlsx文件，如果前一列包含單詞中的單詞，我想在單詞中添加單詞「bodypart」預定義的bodyparts列表。Python/pandas - 根據另一列中的單詞添加單詞

原始數據幀：

Sentence Type 
my hand NaN 
the fish NaN

結果數據框：

Sentence Type 
my hand bodypart 
the fish NaN

沒有我試過的作品。我覺得我失去了一些非常明顯的東西。這是我的最後一個（失敗）的嘗試：

import pandas as pd 
import numpy as np 
bodyparts = ['lip ', 'lips ', 'foot ', 'feet ', 'heel ', 'heels ', 'hand ', 'hands '] 

df = pd.read_excel(file) 

for word in bodyparts : 
    if word in df["Sentence"] : df["Type"] = df["Type"].replace(np.nan, "bodypart", regex = True)

我也試過，用其變體「南」，併爲NaN str.replace的第一個參數：

if word in df['Sentence'] : df["Type"] = df["Type"].str.replace("", "bodypart")

任何幫助將不勝感激！

來源

2017-03-10 Merlin

查找到'pandas.DataFrame。 apply（）'方法。如果輸入位於bodyparts列表中，您可以創建一個返回「bodypart」的函數。然後，您可以逐行應用該功能。 – Jakub

您可以創建一個正則表達式在字邊界搜索，然後用它作爲參數傳遞給str.contains，如：

import pandas as pd 
import numpy as np 
import re 

bodyparts = ['lips?', 'foot', 'feet', 'heels?', 'hands?', 'legs?'] 
rx = re.compile('|'.join(r'\b{}\b'.format(el) for el in bodyparts)) 

df = pd.DataFrame({ 
    'Sentence': ['my hand', 'the fish', 'the rabbit leg', 'hand over', 'something', 'cabbage', 'slippage'], 
    'Type': [np.nan] * 7 
}) 

df.loc[df.Sentence.str.contains(rx), 'Type'] = 'bodypart'

爲您提供：

  Sentence  Type 
0   my hand bodypart 
1  the fish  NaN 
2 the rabbit leg bodypart 
3  hand over bodypart 
4  something  NaN 
5   cabbage  NaN 
6  slippage  NaN

來源

2017-03-10 16:50:45

一個骯髒的解決方案將涉及檢查兩個交集。

集合A是你的身體零件清單，設置B是在句子

df['Sentence']\ 
    .apply(lambda x: 'bodypart' if set(x.split()) \ 
    .symmetric_difference(bodyparts) else None)

來源

2017-03-10 16:31:14 putonspectacles

在組詞的最簡單的方法：

df.loc[df.Sentence.isin(bodyparts),'Type']='Bodypart'

之前，你必須在bodyparts拋棄空間：

bodyparts = {'lip','lips','foot','feet','heel','heels','hand','hands'}

df.Sentence.isin(bodyparts)選好行，Type要設置的列。 .loc是允許修改的索引器。

來源

2017-03-10 16:45:24

Python/pandas - 根據另一列中的單詞添加單詞

回答

相關問題