Python的正則表達式：的findall（）和搜索（）

>>> p = re.compile(r"(\b\w+)\s+\1")

\b ：字邊界
\w+ ：一個或多個字母數字字符
可以是，\t，\n，..）
\1 ：反向引用到組1（= (..)之間的部分）

此正則表達式應該找到一個單詞的所有雙OCCURENCES - 如果兩個OCCURENCES是彼此相鄰，兩者之間有一些空白。
正則表達式似乎使用搜索功能時，做工精細：

>>> p.search("I am in the the car.") 

<_sre.SRE_Match object; span=(8, 15), match='the the'>

找到的匹配是the the，正如我所預料的。怪異的行爲是在的findall功能：現在

>>> p.findall("I am in the the car.") 

['the']

的發現對手只有the。爲什麼區別？

2017-04-17 K.Mulier

因爲'findall'只返回捕獲組（或否則完整匹配）。 –

https://docs.python.org/3/library/re.html#re.findall「如果模式中存在一個或多個組，請返回組列表」 – melpomene

哦，現在我明白了。謝謝。所以我必須使用一個非捕獲組來解決這個問題？我現在就試試看。 –

在正則表達式中使用組時，findall()只返回組;從documentation：

If one or more groups are present in the pattern, return a list of groups; this will be a list of tuples if the pattern has more than one group.

你不能避免使用反向引用時使用組，但您可以把新組圍繞整個模式：

>>> p = re.compile(r"((\b\w+)\s+\2)") 
>>> p.findall("I am in the the car.") 
[('the the', 'the')]

外組爲1組，所以反向引用應指向組2.您現在有兩個組，因此每個條目有兩個結果。使用一組命名可能使這個更具可讀性：

>>> p = re.compile(r"((?P<word>\b\w+)\s+(?P=word))")

可以篩選回到剛纔外組的結果：如果有任何

>>> [m[0] for m in p.findall("I am in the the car.")] 
['the the']

2017-04-17 14:31:15

很好的答案！謝謝Martijn :-) –

回答