如何使用Python正則表達式獲取Image src？

如何使用正則表達式使用Python如何使用Python正則表達式獲取Image src？

<td width="80" align="center" valign="top"><a href="http://news.google.com/news/url?sa=t&fd=R&usg=AFQjCNFqz8ZCIf6NjgPPiTd2LIrByKYLWA&url=http://www.news.com.au/business/spain-victory-faces-market-test/story-fn7mjon9-1226390697278"><img src="//nt3.ggpht.com/news/tbn/380jt5xHH6l_FM/6.jpg" alt="" border="1" width="80" height="80" /> NEWS.com.au</a></td>

從以下HTML字符串得到的圖像的src我試圖用

matches = re.search('@src="([^"]+)"',text) 
print(matches[0])

，但一無所獲

來源

2012-06-10 Don Li

什麼是 '@' 字符應該匹配？輸入字符串中沒有這樣的字符。 –

正則表達式和html？ – Ben

http://stackoverflow.com/a/1732454/311220的 – Acorn

剛剛失去了@在正則表達式，它會工作

來源

2012-06-10 20:26:00 buckley

-1

您可以簡化您re一點點：

match = re.search(r'src="(.*?)"', text)

來源

2012-06-10 20:30:07

它得到javascript文件了。 –

代替正則表達式的，你可以考慮使用BeautifulSoup：

>>> from bs4 import BeautifulSoup 
>>> soup = BeautifulSoup(junk) 
>>> soup.findAll('img') 
[<img src="//nt3.ggpht.com/news/tbn/380jt5xHH6l_FM/6.jpg" alt="" border="1" width="80" height="80" />] 
>>> soup.findAll('img')[0]['src'] 
u'//nt3.ggpht.com/news/tbn/380jt5xHH6l_FM/6.jpg'

來源

2012-06-10 20:33:12 fraxel

不會美麗的湯增加了很多解決方案的開銷？ 'img'標籤相對容易解析（並且由於它們不包含其他文本，通常格式正確） –

如何使用Python正則表達式獲取Image src？

回答

相關問題