我有一個帶有文本列的excel文件。我需要做的就是從文本列中爲每一行提取特定單詞的句子。使用帶有特定單詞的熊貓提取語句
我試過使用定義一個函數。
import pandas as pd
from nltk.tokenize import sent_tokenize
from nltk.tokenize import word_tokenize
#################Reading in excel file#####################
str_df = pd.read_excel("C:\\Users\\HP\Desktop\\context.xlsx")
################# Defining a function #####################
def sentence_finder(text,word):
sentences=sent_tokenize(text)
return [sent for sent in sentences if word in word_tokenize(sent)]
################# Finding Context ##########################
str_df['context'] = str_df['text'].apply(sentence_finder,args=('snakes',))
################# Output file #################################
str_df.to_excel("C:\\Users\\HP\Desktop\\context_result.xlsx")
但有人可以幫助我,如果我一定要找到多個特定單詞的一句話snakes
,venomous
,anaconda
。該句子至少應該有一個詞。我無法用多個詞來解決nltk.tokenize
。
要被搜索words = ['snakes','venomous','anaconda']
輸入Excel文件:
text
1. Snakes are venomous. Anaconda is venomous.
2. Anaconda lives in Amazon.Amazon is a big forest. It is venomous.
3. Snakes,snakes,snakes everywhere! Mummyyyyyyy!!!The least I expect is an anaconda.Because it is venomous.
4. Python is dangerous too.
所需的輸出:
柱稱爲上下文附加到上述文字列。上下文欄應該是這樣的:
1. [Snakes are venomous.] [Anaconda is venomous.]
2. [Anaconda lives in Amazon.] [It is venomous.]
3. [Snakes,snakes,snakes everywhere!] [The least I expect is an anaconda.Because it is venomous.]
4. NULL
在此先感謝。
請發佈你的'str_df'的[mcve](http://stackoverflow.com/help/mcve)以及你想要的輸出。 –
@JulienMarrec編輯。謝謝。 – user7140275
你的第三個例子用'因爲'有兩個句子,這似乎你想要共同參考的分辨率,這是不容易的。如果你只需要提取句子,它就容易得多(即用!來分隔文本)。此外,請顯示您的當前輸出,即使它是錯誤的。 –