在python中搜索關鍵字

我想寫一個python腳本，以便它可以搜索文檔中的關鍵字，並檢索關鍵字所在的整個句子。從我的研究中，我看到acora可以使用，但我仍然發現它不成功。在python中搜索關鍵字

2011-06-30 Ryan

'$貓文檔.txt | grep「keyword」' – 2011-06-30 06:25:50

@Franklin與他所說的完全不同。他要求判刑。 –

是的，我意識到grep「關鍵字」只是爲「關鍵字」。但是我在尋找的是，如果關鍵字出現，我試圖抓住關鍵字所在的整個句子。有任何想法嗎？ – Ryan

這就是你可以簡單地在shell中執行它的方法。你應該自己寫在腳本中。

>>> text = '''this is sentence 1. and that is sentence 
       2. and sometimes sentences are good. 
       when that's sentence 4, there's a good reason. and that's 
       sentence 5.''' 
>>> for line in text.split('.'): 
...  if 'and' in line: 
...   print line 
... 
and that is sentence 2 
and sometimes sentences are good 
and that's sentence 5

在這裏，我分裂text與.split('.')和迭代，然後用字and控制，並且如果其包含，打印它。

您還應該考慮這是區分大小寫。您應該考慮您的解決方案很多東西，比如事情!和?結束也句子（但有時他們不）

這是一個句子（哈？），或者你認爲（！），所以？

將被分裂爲

這是一個句子（HA
），或者你認爲（
），所以

來源

2011-06-30 06:32:21

>>> text = """Hello, this is the first sentence. This is the second. 
And this may or may not be the third. Am I right? No? lol...""" 

>>> import re 
>>> s = re.split(r'[.?!:]+', text) 
>>> def search(word, sentences): 
     return [i for i in sentences if re.search(r'\b%s\b' % word, i)] 

>>> search('is', s) 
['Hello, this is the first sentence', ' This is the second']

來源

2011-06-30 06:35:55 JBernardo

-1：即使它沒有包含單詞「is」，你的函數也會匹配第三個句子。它包含單詞「this」中的* sequence *''is''。 – Blair

@Blair哦，是的。沒有意識到這一點。修復起來非常簡單，你也應該減少其他答案，因爲他們還使用'用單詞'來找到答案。 – JBernardo

@Blair不敢相信你真的那麼做過。試着做個好兄弟 – JBernardo

我不沒有太多的經驗，但你可能正在尋找nltk。

嘗試this;使用span_tokenize並找出您的單詞的索引屬於哪個範圍，然後查看該句子。

來源

2011-06-30 06:36:46 nattofriends

使用grep或egrep命令與python的子進程模塊，它可以幫助你。

e.g：

from subprocess import Popen, PIPE 

stdout = Popen("grep 'word1' document.txt", shell=True, stdout=PIPE).stdout 
#to search 2 different words: stdout = Popen("egrep 'word1|word2' document.txt",  
#shell=True, #stdout=PIPE).stdout 
data = stdout.read() 
data.split('\n')

來源

2011-06-30 09:16:39 Yajushi

在python中搜索關鍵字

回答

相關問題