如何在Python中使用帶有列表或字符串的unicode

所以我有一些愛爾蘭語（蓋爾詞）的單詞列表，我想使用unicode，以便RDFlib能夠理解上面的一些重音符號字中的字母。我不知道是否在列表中或之後使用unicode。這裏是我的代碼至今：在文件如何在Python中使用帶有列表或字符串的unicode

樣品線= 00001740 n 3 eintiteas aonán beith 003 ~ 00001930 n 0000

def process_file(self): 
    self.file = open("testing_line_ir.txt", "r") 
    return self.file 

def line_for_loop(self, file): 
    for line in file: 
     self.myline = unicode(line, 'utf-8') 
     for line in self.myline: 
     ............here is where other processes are ran.......

這是給了錯誤：

UnicodeDecodeError: 'utf8' codec can't decode byte 0xe1 in position 26: invalid continuation byte

我也有試過這樣：

def get_words_list(self, word_part, num_words): 
    self.word = word_part[3:3 + num_words:1] 
    self.myword = [unicode(i) for i in self.word] 
    return self.myword

在這種情況下，'word'是單詞['eintiteas'，'aonán'，'beith']的列表，我嘗試使用g myword作爲編碼列表，具有與上述相同的錯誤。

編輯：這裏是從發生錯誤的源代碼，它發生在graph.parse線穿過像塊1和命名空間的變量是隻是文字

def compose_printout(self, namespaces, block1, block2, close_rdf): 
    self.printout += namespaces + block1 + block2 + close_rdf 
    self.tabfile = StringIO(self.printout) 
    return self.tabfile 

def serialize(self, graph, tabfile): 
    """ This will serialize with RDFLib """ 
    graph.parse(tabfile, publicID=None, format="xml")

其中的一些線字被添加到一個RDFlib圖，所以這裏的任何幫助將是偉大的！

來源

2013-07-16 Johnnerz

您的輸入數據是**不是** UTF-8編碼。您需要找到用於文件數據的正確編解碼器。 –

-1

你可以嘗試在你的文件頭

import sys 
reload(sys) 
sys.setdefaultencoding('utf8')

來源

2013-07-16 14:52:45 SFeng

這是一個壞主意，原因很多。更改用於Python字符串強制的默認編碼不是解決方案。真正的解決方案是在需要時顯式編碼或解碼。 –

此外，OP已*試圖明確解碼，但正在使用錯誤的編解碼器。如何將默認編碼設置爲UTF-8將在那裏幫助？ –

這些代碼不必UTF-8的數據。從異常消息我會說你的Latin-1編碼的數據，而不是：

>>> print '\xe1'.decode('latin1') 
á

可以使用codecs.open() function創建返回文件數據準備解碼的文件對象：

import codecs 

def process_file(self): 
    self.file = codecs.open("testing_line_ir.txt", "r", 'latin-1') 
    return self.file 

def line_for_loop(self, file): 
    for line in file: 
     # line is *already* unicode

來源

2013-07-16 15:02:29

現在得到這個錯誤：'UnicodeEncodeError：'ascii'編解碼器無法編碼字符u'\ xe1'在位置331：序號不在範圍內（128）' – Johnnerz

@Johnnerz：您在某處混合unicode和字節字符串值。不要那樣做，這些類型之間的隱性強制以醜陋的方式失敗。 –

我如何知道我在做什麼？這對我來說完全是陌生的，我只在2個月前開始Python！ – Johnnerz

如何在Python中使用帶有列表或字符串的unicode

回答

相關問題