python findall只發現最後一次發生

我想從使用Python 2.7.5的網頁中提取一些數據。python findall只發現最後一次發生

代碼：

p = re.compile(r'.*<section\s*id="(.+)">(.+)</section>.*') 
str = 'df <section id="1">2</section> fdd <section id="3">4</section> fd' 
m = p.findall(str) 
for eachentry in m: 
    print 'id=[{}], text=[{}]'.format(eachentry[0], eachentry[1])

輸出：

id=[3], text=[4]

它爲什麼只提取最後一次出現？如果我刪除最後發生的第一個發現

來源

2014-02-07 4ntoine

.*在開始時非常貪婪，它會消耗到最後一次發生。實際上表達式中的所有.*都非常貪婪。所以，我們讓他們不貪婪與?，這樣

p = re.compile(r'.*?<section\s*id="(.+?)">(.+?)</section>.*?')

，輸出變爲

id=[1], text=[2] 
id=[3], text=[4]

事實上，你可以刪除第一個和最後一個.*模式和保持它的簡單這樣

p = re.compile(r'<section\s*id="(.+?)">(.+?)</section>')

來源

2014-02-07 06:24:50 thefourtheye

如果str是多行字符串，應該如何更改它？如果我添加，編譯（..，re.MULTILINE）沒有發現任何東西，但如果我在發現所有東西之前替換（'\ n'，''）ok – 4ntoine

發現我自己：需要使用re.DOTALL標誌 – 4ntoine

你的正則表達式需要如下改變：

p = re.compile(r'<section\s*id="(.+?)">(.+?)</section>')

來源

2014-02-07 06:25:58

python findall只發現最後一次發生

回答

相關問題