基於正則表達式

Python的拆分標籤，我想如下拆分以下標記<b size=5 alt=ref>：基於正則表達式

Open tag: b 
Parm: size=5 
Parm: alt=ref

不過，我嘗試下面的代碼，以分割標籤的羣體，但它沒有工作：

temp = '<b size=5 alt=ref>' 
matchObj = re.search(r"(\S*)\s*(\S*)", temp) 
print 'Open tag: ' + matchObj.groups()

我的計劃是將標籤拆分成組，然後打印第一組作爲開放標籤，其餘爲Parm。你能提出一些有助於我解決這個問題的想法嗎？

請注意，我從html文件中讀取標籤，但我在這裏提到了一個打開標籤的示例，並且展示了我堅持使用的部分代碼。

感謝

來源

2015-10-14 Nasser

有沒有使用HTML解析器理由嗎？ –

如果[搜索]（https://www.google.com/webhp?sourceid=chrome-instant&rlz=1C1GTPM_enUS601US601&ion=1&espv=2&ie=UTF-8#q=c%2B%2B%20parse%20xml%20using%20regex ）你會發現[許多人不鼓勵]（https://stackoverflow.com/questions/4122624/would-you-implement-a-lightweight-xml-parser-with-regex）試圖解析XML/HTML /等使用正則表達式，因爲已經有更強大的方法可以做到這一點。 – CoryKramer

tag_names = ["Open tag:","Parm:","Parm:"] 
import re 
# split on <,>,white space, and remove empty strings at 
# the start and at the end of the resulting list. 
tags = re.split(r'[<> ]','<b size=5 alt=ref>')[1:-1] 
# zip tag_names list and with list of tags 
print(list(zip(tag_names, tags))) 

[('Open tag:', 'b'), ('Parm:', 'size=5'), ('Parm:', 'alt=ref')]

來源

2015-10-14 15:59:18 LetzerWille

雖然這個答案可能是正確的，但請添加一些解釋。賦予基礎邏輯比賦予代碼更重要，因爲它可以幫助OP和其他讀者自己解決這個問題和類似的問題。 – CodeMouse92

>>> import re 
>>> temp = '<b size=5 alt=ref>' 
>>> resList = re.findall("\S+", temp.replace("<","").replace(">","")) 
>>> myDict = {} 
>>> myDict["Open tag:"] = [resList[0]] 
>>> myDict["Parm:"] = resList[1:] 
>>> myDict 
{'Open tag:': ['b'], 'Parm:': ['size=5', 'alt=ref']}

來源

2015-10-14 17:28:49

基於正則表達式

回答

相關問題