從HTML標籤中刪除屬性

可能重複：
php: how can I remove attributes from an html tag?
How do I iterate over the HTML attributes of a Beautiful Soup element?從HTML標籤中刪除屬性

我有一些HTML類似如下：

<div class="foo"> 
    <p id="first">Hello, world!</p> 
    <p id="second">Stack Overflow</p> 
</div>

，它需要回來如下：

<div> 
    <p>Hello, world!</p> 
    <p>Stack Overflow</p> 
</div>

我更喜歡Python解決方案，因爲我已經在需要使用的程序中使用BeautifulSoup。但是，如果這是更好的解決方案，我會向PHP開放。我不認爲sed正則表達式就足夠了，特別是在將來可能會使用文本中的<符號（我不控制輸入）。

來源

2011-08-22 Rory

和[如何-DO-I-迭代 - 過度的HTML的屬性 - 對的一美麗的湯元（ http://stackoverflow.com/questions/822571/how-do-i-iterate-over-the-html-attributes-of-a-beautiful-soup-element）和[python-how-to-search-and- correct-html-tags-and-attributes]（http://stackoverflow.com/questions/3360968/python-how-to-search-and-correct-html-tags-and-attributes）和[python-extracting-html -tag-attributes-without-regular-expressions]（http://stackoverflow.com/questions/7141431/python-extracting-html-tag-attributes-without-regular-expressions） – agf

你試過什麼了？（請不要嘗試使用正則表達式，特別是如果您已經知道如何使用像美麗湯這樣的HTML解析器）。 – geoffspear

我試過使用正則表達式，但它很長，並在某處出錯。 – Rory

這工作也與SED， <（[A-ZA-Z！] +）[^>] +> 然後僅通過第一組等取代， < \ 1>

來源

2011-08-22 16:47:27 xob

這是通過使用Lxml在Python中很容易實現。

首先安裝Lxml，並嘗試下面的代碼：

from lxml.html import tostring, fromstring 

html = ''' 
<div class="foo"> 
    <p id="first">Hello, world!</p> 
    <p id="second">Stack Overflow</p> 
</div>''' 

htmlElement = fromstring(html) 
for element in htmlElement.cssselect(''): 
    for key in element.keys(): 
     element.attrib.pop(key) 

result = tostring(htmlElement) 

print result

來源

2011-08-22 16:55:40 enderskill

從HTML標籤中刪除屬性

回答

相關問題