在python正則表達式中匹配多行

我想從HTML頁面提取<tr>標記之間的數據。我使用了下面的代碼。但是我沒有得到任何結果。在<tr>標籤內的HTML是多行在python正則表達式中匹配多行

category =re.findall('<tr>(.*?)</tr>',data);

請建議針對此問題的修復程序。

來源

2010-02-04 Sreejith Sasidharan

閱讀文檔：http://docs.python.org/library/re.html#re.S – SilentGhost 2010-02-04 12:24:35

或者上面的一段：http://docs.python.org/library/re.html#re.MULTILINE :) – 2010-02-04 12:27:00

@Tomasz：but ** do read header beyond;） – SilentGhost 2010-02-04 12:35:20

只是爲了澄清問題。儘管所有這些鏈接到re.M它不會在這裏工作，因爲它的解釋簡單略讀會揭示。你需要re.S，如果你不會試圖解析HTML，當然：

>>> doc = """<table border="1"> 
    <tr> 
     <td>row 1, cell 1</td> 
     <td>row 1, cell 2</td> 
    </tr> 
    <tr> 
     <td>row 2, cell 1</td> 
     <td>row 2, cell 2</td> 
    </tr> 
</table>""" 

>>> re.findall('<tr>(.*?)</tr>', doc, re.S) 
['\n  <td>row 1, cell 1</td>\n  <td>row 1, cell 2</td>\n ', 
'\n  <td>row 2, cell 1</td>\n  <td>row 2, cell 2</td>\n '] 
>>> re.findall('<tr>(.*?)</tr>', doc, re.M) 
[]

來源

2010-02-04 12:52:05 SilentGhost

're.findall（'（' *？）'，doc，re.S）'也可以寫成're.findall（'（？s）（。*？）'，doc）'。 – tzot 2010-02-04 22:07:45

感謝re.S修復工作 – 2010-02-05 06:19:10

不要使用正則表達式來解析HTML。使用HTML解析器，如lxml或BeautifulSoup。

來源

2010-02-04 12:24:20

pat=re.compile('<tr>(.*?)</tr>',re.DOTALL|re.M) 
print pat.findall(data)

還是非正則表達式的方式，

for item in data.split("</tr>"): 
    if "<tr>" in item: 
     print item[item.find("<tr>")+len("<tr>"):]

來源

2010-02-04 12:33:48 ghostdog74

不要使用正則表達式，使用HTML解析器如BeautifulSoup：

html = '<html><body>foo<tr>bar</tr>baz<tr>qux</tr></body></html>' 

import BeautifulSoup 
soup = BeautifulSoup.BeautifulSoup(html) 
print soup.findAll("tr")

結果：

[<tr>bar</tr>, <tr>qux</tr>]

如果你只是想要的內容，沒有噸[R標籤：

for tr in soup.findAll("tr"): 
    print tr.contents

結果：

bar 
qux

使用HTML解析器並不像聽起來那麼可怕！它將比任何將發佈在這裏的正則表達式更可靠。

來源

2010-02-04 12:36:33

至於其他建議您有可以解決的具體問題，通過允許使用re.MULTILINE

但是你會下來奸詐補丁解析HTML with regular expressions多行匹配。改爲使用XML/HTML解析器，BeautifulSoup非常適合這個！

doc = """<table border="1"> 
    <tr> 
     <td>row 1, cell 1</td> 
     <td>row 1, cell 2</td> 
    </tr> 
    <tr> 
     <td>row 2, cell 1</td> 
     <td>row 2, cell 2</td> 
    </tr> 
</table>""" 

from BeautifulSoup import BeautifulSoup 
soup = BeautifulSoup(doc) 
all_trs = soup.findAll("tr")

來源

2010-02-04 12:45:54

在python正則表達式中匹配多行

回答

相關問題