2010-10-20 41 views

回答

4

答案很簡單,可能會錯過許多細微差別。如何,這應該讓你知道如何去做,改善它。我相信這可以改進,但你應該能夠在文檔的幫助下快速完成。

參考文檔:http://www.crummy.com/software/BeautifulSoup/documentation.html

from bs4 import BeautifulSoup 

doc = ['<html><script type="text/javascript">document.write("Hello World!")', 
     '</script><head><title>Page title</title></head>', 
     '<body><p id="firstpara" align="center">This is paragraph <b>one</b>.', 
     '<p id="secondpara" align="blah">This is paragraph <b>two</b>.', 
     '</html>'] 
soup = BeautifulSoup(''.join(doc)) 


for tag in soup.findAll('script'): 
    # Use extract to remove the tag 
    tag.extract() 
    # use simple insert 
    soup.body.insert(len(soup.body.contents), tag) 

print soup.prettify() 

輸出:

<html> 
<head> 
    <title> 
    Page title 
    </title> 
</head> 
<body> 
    <p id="firstpara" align="center"> 
    This is paragraph 
    <b> 
    one 
    </b> 
    . 
    </p> 
    <p id="secondpara" align="blah"> 
    This is paragraph 
    <b> 
    two 
    </b> 
    . 
    </p> 
    <script type="text/javascript"> 
    document.write("Hello World!") 
    </script> 
</body> 
</html> 
相關問題