Python的正則表達式幫助

我想通過HTML標籤排序，我似乎無法得到正確的。Python的正則表達式幫助

我迄今所做

import urllib 
import re 

s = raw_input('Enter URL: ') 
f = urllib.urlopen(s) 
s = f.read() 
f.close 
r = re.compile('<TAG\b[^>]*>(.*?)</TAG>',) 
result = re.findall(r, s) 
print(result)

哪裏取代「TAG」與標籤我希望看到的。

在此先感謝。

來源

2011-01-31 Krayons

使用XML解析器來解析HTML。強制性鏈接：http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454 – 2011-01-31 22:05:02

不要用正則表達式解析HTML。正則表達式是解析HTML的不夠複雜的工具。如果有人要求你這樣做，用棍子在頭上打他們，然後使用BeautifulSoup。這對你們倆來說都不會那麼痛苦。 – 2011-01-31 22:27:09

你目前得到了什麼樣的結果？ – Eli 2011-01-31 22:27:18

從BS的一個例子是這樣的

from BeautifulSoup import BeautifulSoup 
doc = ['<html><head><title>Page title</title></head>', 
     '<body><p id="firstpara" align="center">This is paragraph <b>one</b>.', 
     '<p id="secondpara" align="blah">This is paragraph <b>two</b>.', 
     '</html>'] 
soup = BeautifulSoup(''.join(doc)) 
soup.findAll('b') 
[<b>one</b>, <b>two</b>]

至於正則表達式，你可以使用

aa = doc[0] 
aa 
'<html><head><title>Page title</title></head>' 
pt = re.compile('(?<=<title>).*?(?=</title>)') 
re.findall(pt,aa) 
['Page title']

來源

2011-01-31 23:38:16 gerry

你應該真的嘗試使用可以執行HTML解析的庫。美麗的湯是我的最愛之一。

來源

2011-01-31 22:00:08 Miguel

我不完全清楚你想用正則表達式實現什麼。捕獲例如兩個div標籤之間的內容與

re.compile("<div.*?>.*?</div>")

工作雖然你會碰到一些問題與上面的一個嵌套的div。

來源

2011-01-31 22:22:41

Python的正則表達式幫助

回答

相關問題