如何在Beautifulsoup標籤中插入空格（＆nbsp）？

我正在嘗試將'& nbsp'添加到Beautifulsoup標記中。 BS將tag.string轉換爲\&ampamp;nbsp;而不是&nbsp。它必須是一些編碼問題，但我無法弄清楚。如何在Beautifulsoup標籤中插入空格（＆nbsp）？

請注意：忽略後面的'\'字符。我不得不添加它，所以stackoverflow會正確地格式化我的問題。

import bs4 as Beautifulsoup 

html = "<td><span></span></td>" 
soup = Beautifulsoup(html) 
tag = soup.find("td") 
tag.string = "&nbsp;"

當前輸出是html =「\ ampamp; nbsp;」

任何想法？

來源

2014-03-05 fat fantasma

你是如何打印輸出？ – shaktimaan

默認情況下，BeautifulSoup使用minimal輸出格式化程序並轉換HTML實體。

的解決方案是設置爲從BS源（PageElement文檔字符串）輸出格式化到None，報價：

# There are five possible values for the "formatter" argument passed in 
# to methods like encode() and prettify(): 
# 
# "html" - All Unicode characters with corresponding HTML entities 
# are converted to those entities on output. 
# "minimal" - Bare ampersands and angle brackets are converted to 
# XML entities: &amp; &lt; &gt; 
# None - The null formatter. Unicode characters are never 
# converted to entities. This is not recommended, but it's 
# faster than "minimal".

實施例：

from bs4 import BeautifulSoup 


html = "<td><span></span></td>" 
soup = BeautifulSoup(html, 'html.parser') 
tag = soup.find("span") 
tag.string = '&nbsp;' 

print soup.prettify(formatter=None)

打印：

<td> 
<span> 
    &nbsp; 
</span> 
</td>

希望有所幫助。

來源

2014-03-05 02:25:22 alecxe

完美！我也找到了答案。謝謝你的回答！ –

-1

您需要添加的Unicode非打破空間，它可以在Python中表示爲「\ XA0」：

soup = BeautifulSoup("", "html5lib") # html5lib will add html and body tags by default 
soup.body.string = "\xa0" # uncode non-breaking space 
soup.encode("ascii") # to see final html in ascii encoding

結果：

b'<html><head></head><body>&#160;</body></html>'

來源

2016-08-12 21:21:06 Muposat

如何在Beautifulsoup標籤中插入空格（＆nbsp）？

回答

相關問題