提取我的html行的標題

-3

我有一個關於提取html行標題的問題。提取我的html行的標題

比方說，我的路線是：

<span class="title_name"> <a href="/?id=2124">Fairwood</a></span>

和笑，我不得不添加一些額外的空間用於行不顯示爲超鏈接..

我怎麼會去自動提取「大快活」，給予一些格式相似的行，使用不同的編號和標題。

在此先感謝

來源

2017-06-09 Ryan Xu

爲什麼downvotes？小評論可能會更有幫助。 –

搜索字符串'href'，然後在遇到'>'後立即開始捕獲，直到找到一個''' – Haris

你可能想看看這個SO帖子：https：//stackoverflow.com/questions/11709079/parsing-html-using-python，也請永遠不要使用正則表達式來解析HTML。見https://stackoverflow.com/a/1732454/190823 – Jens

解析器解決方案有什麼問題？

import xml.etree.ElementTree as ET 
root = ET.fromstring('<span class="title_name"> <a href="/?id=2124">Fairwood</a></span>') 
print(root.find("a").text) 
# Fairwood

來源

2017-06-09 09:22:37 Jan

如果同樣的格式，那麼可以試試：

import re 
html=''' 
<span class="title_name1"> <a href="/?id=2124">Fairwood1</a></span> 
<span class="title_name2"> <a href="/?id=2125">Fairwood2</a></span>''' 
print re.findall(r'\w+(?=</a></span>)',html,re.M)

來源

2017-06-09 09:25:37 Kerwin

你不需要多行標誌如果沒有錨點要匹配。 – Jan

提取我的html行的標題

回答

相關問題