我想將解析的HTML文件保存爲TXT文件

我解析了顯示文章的網頁。我想保存解析數據轉換成文本文件，但我的Python殼顯示這樣的錯誤：我想將解析的HTML文件保存爲TXT文件

UnicodeEncodeError: 'ascii' codec can't encode character u'\u2019' in position 107: ordinal not in range(128)

，這裏是我的代碼的一部分

search_result = urllib.urlopen(url) 
f = search_result.read() 
#xml parsing 
parsedResult = xml.dom.minidom.parseString(f) 
linklist = parsedResult.getElementsByTagName('link') #extracting links 
extractedURL = linklist[3].firstChild.nodeValue #pick one link 
page = urllib.urlopen(extractedURL).read() 
#making html file 
g= open('yyyy.html', 'w') 
g.write(page) 
g.close() 
#reading html file and parsing html to get pure text of article 
g= open('yyyy.html', 'r') 
bs = BeautifulSoup(g,fromEncoding="utf-8") 
g.close() 
article = bs.find(id="articleBody") 
content = article.get_text() 
#save as a text file 
h= open('yyyy.txt', 'w') 
h.write(content) 
h.close()

我要補充，使這項工作？

來源

2013-05-08 user2351602

與

import codecs 
h = codecs.open('yyyy.txt', 'w', 'utf-8')

或使用Python 3

來源

2013-05-08 16:44:30

謝謝。你解決了我的問題:) – user2351602 2013-05-08 16:51:22

嘗試使用unidecode：

from unidecode import unidecode 

unidecode(page)

來源

2013-05-08 16:21:33 nnaelle

有我的蟒蛇沒有這樣的模塊嘗試。我如何獲得該模塊？ – user2351602 2013-05-08 16:24:51

對不起，你會在這裏找到它[鏈接]（https://pypi.python.org/pypi/Unidecode） – nnaelle 2013-05-08 16:26:33

仍然無法正常工作。同樣的錯誤信息。 – user2351602 2013-05-08 16:29:26

我想將解析的HTML文件保存爲TXT文件

回答

相關問題