我有以下HTML:get_text()有UnicodeEncodeError
<div class="dialog">
<div class="title title-with-sort-row">
<h2>Description</h2>
<div class="dialog-search-sort-bar">
</div>
</div>
<div class="content"><div style="margin-right: 20px; margin-left: 30px;">
<span class="description2">
With 「Antonia Polygon – Standard」, you have a figure that is unique in the Poser community.
She is made available under a Creative Commons License that gives endless opportunities for further development.
This figure was developed by a group of talented members of the Poser community in a thirty-month effort.
The result is a figure that has very good bending and morphing behavior.
<br />
</span>
</div>
</div>
我需要找到這個div出class="dialog"
數的div,然後拉出在span class="description2"
文本。
當我使用的代碼:
description = soup.find(text = re.compile('Description'))
if description != None:
someEl = description.parent
parent1 = someEl.parent
parent2 = parent1.parent
description = parent2.find('span', {'class' : 'description2'})
print 'Description: ' + str(description)
我得到:
<span class="description2">
With Â「Antonia Polygon – StandardÂ」, you have a figure that is unique in the Poser community.
She is made available under a Creative Commons License that gives endless opportunities for further development.
This figure was developed by a group of talented members of the Poser community in a thirty-month effort.
The result is a figure that has very good bending and morphing behavior.
<br/>
</span>
如果我試圖讓只是文本,而HTML &非ASCII字符,使用
description = description.get_text()
我收到一個(UnicodeEncodeError): 'ascii' codex can't encode character u'\x93'
如何將這個HTML塊轉換爲直線ascii?
字符'''不是ASCII字符。您的目標是確定最相似的字符是ASCII(這很難),或者您的目標是簡單地移除所有非ASCII字符?或者是你真正想要輸出正確的Unicode,例如UTF-8,而不是ASCII? – jogojapan 2012-04-23 02:04:45
只是刪除所有非ASCII字符 – Stephen 2012-04-24 21:44:22
強制:http://bit.ly/unipain – Daenyth 2012-05-07 12:55:56