2015-09-06 14 views
6

我一直在努力與這一段時間。我正在嘗試將字符串寫入HTML,但一旦我清除了它們,就會遇到格式問題。這裏有一個例子:Python的HTML編碼 xc2 xa0

paragraphs = ['Grocery giant and household name Woolworths is battered and bruised. ', 
'But behind the problems are still the makings of a formidable company'] 

x = str(" ") 
for item in paragraphs: 
    x = x + str(item) 
x 

輸出:

"Grocery giant and household name\xc2\xa0Woolworths is battered and\xc2\xa0bruised. 
But behind the problems are still the makings of a formidable\xc2\xa0company" 

所需的輸出:

"Grocery giant and household name Woolworths is battered and bruised. 
But behind the problems are still the makings of a formidable company" 

我希望你能解釋,爲什麼出現這種情況,我該如何解決。提前致謝!

+2

您是否檢查過源字符串中不正常的Unicode空白字符? –

回答

14

\ XC2 \ XA0意味着爲0xC2 0XA0就是所謂

不間斷空格

它是UTF-8編碼的一種無形的控制字符。更多關於它的信息請查看wikipedia:https://en.wikipedia.org/wiki/Non-breaking_space

我複製了你粘貼的問題並獲得了預期的輸出結果。

+5

謝謝。這解決了它。我建在: x.replace(「\ xc2 \ xa0」,「」) –