Python RE在文本文檔中查找特定的單詞

-3

我正在嘗試在使用正則表達式的文本文檔內的一行中找到特定單詞。我嘗試使用下面的代碼，但它不能正常工作。Python RE在文本文檔中查找特定的單詞

import re 
f1 = open('text.txt', 'r') 
for line in f1: 
    m = re.search('(.*)(?<=Dog)Food(.*)', line) 
    m.group(0) 
    print "Found it." 
f1.close()

錯誤：

Traceback (most recent call last): 
    File "C:\Program Files (x86)\Microsoft Visual Studio 11.0 
ns\Microsoft\Python Tools for Visual Studio\2.0\visualstudi 
0, in exec_file 
    exec(code_obj, global_variables) 
    File "C:\Users\wsdev2\Documents\Visual Studio 2012\Projec 
TML Head Script\HTML_Head_Script.py", line 6, in <module> 
    m.group(0) 
AttributeError: 'NoneType' object has no attribute 'group'

來源

2013-07-02 Noah R

什麼沒有按」工作正常嗎？請解釋 – TerryA

添加了我得到的錯誤。 –

添加了一個答案:) – TerryA

你得到一個AttributeError: 'NoneType' object has no attribute 'group'，因爲比賽還沒有被發現。

re.search()將返回None如果沒有比賽，所以你可以這樣做：

import re 
with open('text.txt', 'r') as myfile: 
    for line in myfile: 
     m = re.search('(.*)(?<=Dog)Food(.*)', line) 
     if m is not None: 
      m.group(0) 
      print "Found it." 
      break # Break out of the loop

編輯：我已經編輯我的答案與你的代碼。另外，我用with/as在這裏，因爲它會自動關閉後的文件（和它看起來很酷：P）

來源

2013-07-02 13:08:22 TerryA

我試圖將if語句添加到for循環的結尾，它不起作用。你能不能展示你如何正確地實現這個代碼到我的代碼上面？似乎有點混亂。 –

@NoahR我編輯了我的答案。 – TerryA

它沒有給我錯誤，但它沒有返回任何結果。我是Python中的新正則表達式，是我使用的正確表達式嗎？ –

有幾個問題與您的程序：

m將是沒有如果在該行不匹配，這就是爲什麼你的程序崩潰。
您的代碼只會在行中找到第一個匹配項（如果存在）。您可以使用re.finditer() method來代替所有匹配。
使用.*之前和之後，一個單詞將出現在另一個單詞的中間，如DogFooding時，該單詞將匹配該單詞。這可能不是你想要的。相反，你可以使用神奇\b原子在你的對手，其中re documentation描述爲

\b Matches the empty string, but only at the beginning or end of a word. A word is defined as a sequence of alphanumeric or underscore characters, so the end of a word is indicated by whitespace or a non-alphanumeric, non-underscore character…

您可能需要使用特殊r'' raw string syntax，而不是手動加倍反斜槓逃脫它。
使用(.*)可以發現匹配前後會發生什麼情況，因此很難使用正則表達式，因爲即使單詞出現多次，也不會有非重疊匹配。相反，使用match.start()和match.end()方法來獲取匹配的字符位置。 Python的match objects are documented online。

考慮到這一點，你的代碼就變成了：

#!/usr/bin/env python2.7 

import re 
f1 = open('text.txt', 'r') 
line_number = 1 
for line in f1: 
    for m in re.finditer(r'\bDogFood\b', line): 
     print "Found", m.group(0), "line", line_number, "at", m.start(), "-", m.end() 
    line_number += 1 
f1.close()

使用這個運行時是text.txt：

This Food is good. 
This DogFood is good. 
DogFooding is great. 
DogFood DogFood DogFood.

該程序打印：

Found DogFood line 2 at 5 - 12 
Found DogFood line 4 at 0 - 7 
Found DogFood line 4 at 8 - 15 
Found DogFood line 4 at 16 - 23

來源

2013-07-02 13:22:55 andrewdotn

Python RE在文本文檔中查找特定的單詞

回答

相關問題