2014-03-19 80 views
2

假設我有一個湯,我想刪除所有段落的所有樣式標籤。所以我想在整個湯中將<p style='blah' id='bla' class=...>變成<p id='bla' class=...>。但我不想碰,比如說,<img style='...'>標籤。我將如何做到這一點?從特定標籤中刪除樣式BeautifulSoup/Python

+0

對於那些誰需要刪除一些類中的特定標籤(python3 ): for soup.findAll(「p」,class _ =「MsoNormal」): \t del x ['class'] – JinSnow

回答

3

的想法是使用find_all('p')遍歷所有p標籤和刪除的樣式屬性:

from bs4 import BeautifulSoup 


data = """ 
<body> 
    <p style='blah' id='bla1'>paragraph1</p> 
    <p style='blah' id='bla2'>paragraph2</p> 
    <p style='blah' id='bla3'>paragraph3</p> 
    <img style="awesome_image"/> 
</body>""" 


soup = BeautifulSoup(data, 'html.parser') 
for p in soup.find_all('p'): 
    if 'style' in p.attrs: 
     del p.attrs['style'] 

print soup.prettify() 

打印:

<body> 
<p id="bla1"> 
    paragraph1 
</p> 
<p id="bla2"> 
    paragraph2 
</p> 
<p id="bla3"> 
    paragraph3 
</p> 
<img style="awesome_image"/> 
</body>