Python將Unicode保存爲XML

我目前正在編寫一個簡短的Python腳本來瀏覽服務器上的某些目錄，找到我要查找的內容並將數據保存到XML文件中。Python將Unicode保存爲XML

問題是某些數據是用其他語言寫的，如「ハローワールド」或類似格式的東西。當試圖挽救這個在我的XML中的條目，我得到以下回溯：

UnicodeDecodeError: 'ascii' codec can't decode byte 0xec in position 18: ordinal not in range(128)

這是我的功能，節省了數據的模樣：

def addHistoryEntry(self, title, url): 
    self.log.info('adding history entry {"title":"%s", "url":"%s"}' % (title, url)) 
    root = self.getRoot() 
    history = root.find('.//history') 
    entry = etree.SubElement(history, 'entry') 
    entry.set('title', title) 
    entry.set('time', str(unixtime())) 
    entry.text = url 
    history.set('results', str(int(history.attrib['results']) + 1)) 
    self.write(root)

self.getRoot()如下：

def getRoot(self): 
    return etree.ElementTree(file = self.config).getroot()

，這裏是寫入的數據的功能（self.write(root)）

def write(self, xmlRoot): 
    bump = open(self.config, 'w+') 
    bump.write(dom.parseString(etree.tostring(xmlRoot, 'utf-8')).toxml()) 
    bump.close()

個

進口：

import xml.etree.ElementTree as etree 
import xml.dom.minidom as dom

如果有人可以幫助我這個問題，請做。感謝您的幫助。

來源

2014-06-07 Ritashugisha

使用'解碼（ 'UTF-8'）' –

使用'codecs.open（）' - http://stackoverflow.com/questions/934160/write-to-utf-8-file-在-蟒蛇 – johntellsall

您可以使用Python的編解碼器庫，並.decode（ 'UTF-8'），以解決這個問題。

import sys,codecs 
reload(sys) 
sys.setdefaultencoding("utf-8")

來源

2014-06-08 03:26:18 user3718733

默認情況下，python會將unicode編碼爲ASCII。由於在ASCII中只支持128個字符，因此它會將異常值提升到128以上。所以解決方案是將您的unicode編碼爲擴展格式，如latin1或UTF8或其他ISO字符集。我建議用UTF-8編碼。語法是：

u"Your unucode Text".encode("utf-8", "ignore")

爲您

title.encode("utf-8", "ignore")

來源

2014-06-08 00:43:32

Python將Unicode保存爲XML

回答

相關問題