格式的正則表達式在Python

我有話格式的正則表達式在Python

wordlist = ['hypothesis' , 'test' , 'results' , 'total']

的名單上有一句話

sentence = "These tests will benefit in the long run."

我要檢查，看看是否在wordlist的詞在句子。我知道，你可以檢查，看看他們是否正在使用中的一句話子：

for word in wordlist: 
    if word in sentence: 
     print word

但是，使用子，我開始匹配不在wordlist的話，例如這裏test將顯示爲一個子即使它是句子中的tests。我可以通過使用正則表達式來解決我的問題，但是，是否可以通過用每個新單詞格式化的方式實現正則表達式，這意味着如果我想查看該單詞是否在句子中，則：

for some_word_goes_in_here in wordlist: 
    if re.search('.*(some_word_goes_in_here).*', sentence): 
     print some_word_goes_in_here

所以在這種情況下，正則表達式會將some_word_goes_in_here解釋爲需要搜索的模式，而不是some_word_goes_in_here的值。有沒有一種方法來格式化輸入some_word_goes_in_here，以便正則表達式搜索some_word_goes_in_here的值？

來源

2014-01-08 kolonel

如果你有更好的溶膠我渴望聽到它。 – kolonel

嘗試使用：

if re.search(r'\b' + word + r'\b', sentence):

\b字界限，將你的話和非單詞字符之間的匹配（單詞字符是任何字母，數字或下劃線）。

例如，

>>> import re 
>>> wordlist = ['hypothesis' , 'test' , 'results' , 'total'] 
>>> sentence = "The total results for the test confirm the hypothesis" 
>>> for word in wordlist: 
...  if re.search(r'\b' + word + r'\b', sentence): 
...    print word 
... 
hypothesis 
test 
results 
total

隨着你的字符串：

>>> sentence = "These tests will benefit in the long run." 
>>> for word in wordlist: 
...  if re.search(r'\b' + word + r'\b', sentence): 
...   print word 
... 
>>>

什麼也沒有打印

來源

2014-01-08 10:58:54 Jerry

謝謝。是的，但在這種情況下，沒有什麼應該匹配。 – kolonel

@kolonel我使用了一個不同的字符串，但讓我把你的一點點 – Jerry

不要使用'list'作爲變量名，掩蓋默認類型.. –

使用\b字邊界來測試的話：

for word in wordlist: 
    if re.search(r'\b{}\b'.format(re.escape(word)), sentence): 
     print '{} matched'.format(word)

但你也可以把這個句子分成單獨的單詞。使用一組單詞列表將讓測試更有效率：

words = set(wordlist) 
if words.intersection(sentence.split()): 
    # no looping over `words` required.

演示：

>>> import re 
>>> wordlist = ['hypothesis' , 'test' , 'results' , 'total'] 
>>> sentence = "These tests will benefit in the long run." 
>>> for word in wordlist: 
...  if re.search(r'\b{}\b'.format(re.escape(word)), sentence): 
...   print '{} matched'.format(word) 
... 
>>> words = set(wordlist) 
>>> words.intersection(sentence.split()) 
set([]) 
>>> sentence = 'Lets test this hypothesis that the results total the outcome' 
>>> for word in wordlist: 
...  if re.search(r'\b{}\b'.format(re.escape(word)), sentence): 
...   print '{} matched'.format(word) 
... 
hypothesis matched 
test matched 
results matched 
total matched 
>>> words.intersection(sentence.split()) 
set(['test', 'total', 'hypothesis', 'results'])

來源

2014-01-08 11:00:57

我正在考慮使用're.escape'並決定反對它，因爲_words_不需要轉義。在更一般的情況下，這是一個很好的建議。 – Alfe

@MartijnPieters謝謝。 – kolonel

@MartjinPieters我認爲將句子拆分成單詞可能會引入錯誤，因爲找到單詞之間的界限並不是一項簡單的任務。 – kolonel

我會使用這樣的：

words = "hypothesis test results total".split() 
# ^^^ but you can use your literal list if you prefer that 
for word in words: 
    if re.search(r'\b%s\b' % (word,), sentence): 
    print word

您甚至可以通過加快這使用單個正則表達式：

for foundWord in re.findall(r'\b' + r'\b|\b'.join(words) + r'\b', sentence): 
    print foundWord

來源

2014-01-08 11:03:20 Alfe

感謝您的解決方案。 – kolonel

格式的正則表達式在Python

回答

相關問題