使用beautifulsoup處理html文件中的字符串內容值

我是python和beautifulsoup的新手 - 所以請耐心等待。我正在嘗試做一些HTML解析。

我想基於HTML文件中的字符串搜索從選定屬性中刪除換行符和緊湊的空白（

例如，對於下面的HTML，我想搜索所有標籤用繩子屬性「XY」，然後從字符串中刪除換行符和多個空格（用一個空格代替

<html> 
    <head></head> 
    <body> 
    <h1>xy 
     z</h1> 
    <p>xy 
     z</p> 
    <div align="center" style="margin-left: 0%; "> 
     <b> 
     <font style="font-family: 'Times New Roman', Times"> 
     ab c 
     </font> 
     <font style="font-family: 'Times New Roman', Times"> 
     xy z 
     </font> 
     </b> 
    </div> 
    </body> 
</html>

生成的HTML應該像：

<html> 
    <head></head> 
    <body> 
    <h1>xy z</h1> 
    <p>xy z</p> 
    <div align="center" style="margin-left: 0%; "> 
     <b> 
     <font style="font-family: 'Times New Roman', Times"> 
     ab c 
     </font> 
     <font style="font-family: 'Times New Roman', Times"> 
     xy z 
     </font> 
     </b> 
    </div> 
    </body> 
</html>

來源

2011-02-18 serverman

好的 - 我找到了一種方法來做到這一點...您使用findall，然後使用replaceWith（）方法，如下所示。

......... 湯= BeautifulSoup（內容） S = soup.findAll（文本= re.compile（「XY」））
中代表S S1：
s1.replaceWith （re.sub（'\ s +'，''，str（s1）））
...........

來源

2011-02-18 19:56:09 serverman

使用beautifulsoup處理html文件中的字符串內容值

回答

相關問題