Python與貪婪組的正則表達式

這是輸入文件的內容。Python與貪婪組的正則表達式

sb.txt 
JOHN:ENGINEER:35

這些是用來評估文件的模式。

finp = open(r'C:\Users\dhiwakarr\PycharmProjects\BasicConcepts\sb.txt','r') 

for line in finp: 
    biodata1 = re.search(r'([\w\W])+?:([\w\W])+?:([\w\W])+?',line) 
    biodata2 = re.search(r'([\w\W]+?):([\w\W]+?):([\w\W]+?)',line) 
    print('line is '+line) 
    print('re.search(r([\w\W])+?:([\w\W])+?:([\w\W])+? '+biodata1.group(1)+' '+biodata1.group(2)+' '+biodata1.group(3)) 
    print('re.search(r([\w\W]+?):([\w\W]+?):([\w\W]+?) '+biodata2.group(1)+' '+biodata2.group(2)+' '+biodata2.group(3))

這是我

line is JOHN:ENGINEER:35 
re.search(r([\w\W])+?:([\w\W])+?:([\w\W])+? N R 3 
re.search(r([\w\W]+?):([\w\W]+?):([\w\W]+?) JOHN ENGINEER 3

輸出我有幾個關於它產生的輸出問題。

爲什麼第一個搜索模式與JOHN，ENGINEER的最後一個字符匹配，但匹配35的第一個字符？我期待貪婪的角色「？」只要找到JOHN和ENGINEER的第一個字符就立即退出。
有人可以幫我理解「+？」的位置影響輸出
要麼聲明？ biodata1和biodata2之間

來源

2015-06-29 Dhiwakar Ravikumar

什麼是每一組應該包含的價值？ –

我不想在組中找到任何特定的值。我只是想了解爲什麼JOHN，ENGINEER的最後一個字符，即'N'和'R'分別匹配？只要在第一種模式biodata1中找到第一個字符，我就期待貪婪的退出匹配。此外，爲什麼biodata2會匹配所有內容？ –

所以你期望匹配每個字母數字單詞的第一個字符？ –

區別是parenthesis

biodata1的地方：

([\w\W])+?:([\w\W])+?:([\w\W])+?

說明：

The parenthesis matches one rgument before : for group(1) 
like wise for group(2) 
But there is no ending criteria for group(3) so it matched the first letter 3 after :

biodata2：

([\w\W]+?):([\w\W]+?):([\w\W]+?)

說明：

You are matching all the words and non-words before : whicj should atleast have 1 words for group(1) 
like wise for group(2) 
but for group(3) you are matching all the word and non-word after second:

+？：

This checks if there is at least one or more character matching the given regex if so match it

來源

2015-06-29 10:16:38 The6thSense

謝謝@Vignesh Kalai，但不會+？在1個字符「J」被發現之前立即退出：然後進入下一個組？在biodata1 –

否+？其實它必須匹配至少一個字符，它可以匹配多於一個字符 – The6thSense

在biodata1中，它只能匹配一個字符，並且已給出該字符的結束限制：因此它匹配'：'前的一個字符。 – The6thSense

Python與貪婪組的正則表達式

回答

相關問題