2013-06-04 46 views
1

我一直試圖用Python語言編寫一個程序,從Excel文件中讀取電池值,並翻譯來自單元格的內容愛沙尼亞語譯成英語或俄語,並將它們合併爲一個字符串。結果打印到文本文件。愛沙尼亞語 - >英語似乎很好地工作,但與俄羅斯,錯誤開始出現:UnicodeDecodeError錯誤:在位置8「ASCII」編解碼器不能解碼字節0xd0:順序不在範圍內(128)

Traceback (most recent call last): 
    File "erid.py", line 140, in <module> 
     f.write(aNimed(row_index, 1, 'ru')+ '\n') 
     File "erid.py", line 120, in aNimed 
    nimi += komponendid[i].strip() 
UnicodeDecodeError: 'ascii' codec can't decode byte 0xd0 in position 8: ordinal not in range(128) 

Traceback (most recent call last): 
File "erid.py", line 140, in <module> 
    f.write(aNimed(row_index, 1, 'ru')+ '\n') File "erid.py", line 120, in aNimed 
    nimi = nimi + komponendid[i][1:].strip() 
UnicodeDecodeError: 'ascii' codec can't decode byte 0xd1 in position 9: ordinal not in 
range(128) 

首先是由「antibakteriaalne」字和第二的「+hoobkäepide」觸發。我懷疑是「+」號是在第二種情況下,故障的原因,而不是「a」。一些俄羅斯人物似乎是一個問題,而有些則不是。我有種想法。

Python代碼:

# -*- coding: utf-8 -*- 
from xlrd import open_workbook, cellname, XL_CELL_TEXT 
from xlwt import Workbook 
from xlutils.copy import copy 
import sonaraamatud #dictionary 

# open file with data 
book = open_workbook('Datafile.xls') 
# Safe write unicode to ignore unicode errors 
# http://developer.plone.org/troubleshooting/unicode.html 
def safe_write(failName, word): 
    if type(word) == str: 
     failName.write(word + '\n') 
    else: 
     failName.write(word.decode("utf-8") + '\n') 

def safeDecode(word): 
    if type(word) == str: 
     word = unicode(word, 'utf-8', errors='ignore') 
     return word 
    else: 
     word = unicode(word) 
     return word 

# Translate surface coating name 
def translatePind(langa, langb, word): 
     answ = "" 
     if (sonaraamatud.kasOlemas3(langa, sonaraamatud.pinnaKatted) == True): 
       answ = langa 
       return answ 
     #if langa is Estonian 
     if (langa == 'et'): 
       # if langb is english 
       if (langb == 'en'): 
         try: 
           answ = sonaraamatud.pinnakattedEstEng[word] 
         except KeyError: 
           answ = word 
       # If lang b is russian 
       elif (langb == "ru"): 
         try: 
           answ = sonaraamatud.pinnakattedEngRus[sonaraamatud.pinnakattedEstEng[word]] 
         except KeyError: 
           answ = word 

     # if langa is english 
     elif (langa == "en"): 
       # if langb is Estonian 
       if (langb == "et"): 
         try: 
           answ = sonaraamatud.pinnakattedEngEst[word] 
         except KeyError: 
           answ = word 
       # if langb is Russian 
       elif (langb == "ru"): 
         try: 
           answ = sonaraamatud.pinnakattedEngRus[word] 
         except KeyError: 
           answ = "KeyError" 
     return answ 

def aNimed(row, sheetNr, lang): 
     # Function combines name 
     # name: aNimed 
     # @param: rida, lehe number 
     # @return: Product name 
     #vali leht (worksheet) 
     sheet = book.sheet_by_index(sheetNr) #sheetNr 
     komponendid = [] 
     nimi = "" 
     if (lang == 'et'): 
     komponendid.append(str(sheet.cell(row, 5).value)) # Model 
       komponendid.append('(' + sheet.cell(row, 6).value + ')')#surface 
       komponendid.append(sheet.cell(row, 7).value) #extras 
     elif (lang == 'en'): 
       komponendid.append(str(sheet.cell(row, 5).value)) # Mudel 
       komponendid.append('(' + translatePind('et', 'en', sheet.cell(row, 6).value) + ')') 
       komponendid.append(sheet.cell(row, 7).value) #lisad 
     elif (lang == 'ru'): 
       """ 
       Alternativ method trying to use safeDecode, NOT working! 
       komponendid.append(str(safeDecode(sheet.cell(row, 5).value))) # Mudel 
       surface= safeDecode(sheet.cell(row, 6).value) 
       komponendid.append('(' + translatePind('et', 'ru', str(surface)) + ')') 
       komponendid.append(safeDecode(sheet.cell(row, 7).value)) #lisad 
       """ 
       komponendid.append(str(sheet.cell(row, 5).value)) # Mudel 
       komponendid.append('(' + translatePind('et', 'ru',sheet.cell(row, 6).value) + ')') 
       komponendid.append(sheet.cell(row, 7).value) #lisad 
     pikkus = len(komponendid) 

     print(komponendid) 
     for i in range(0, pikkus): 
       if (komponendid[i] == "" or komponendid[i] == "()" or komponendid[i] == " "): 
         i+=1 
         continue 
       elif (i == pikkus-1 and komponendid[i][0] != " "): 
         print("1"+ komponendid[i]) 
         nimi += komponendid[i].strip() 
         i+=1 
       elif (komponendid[i][0] == " " and komponendid[i][1]== "+"): 
         #print("2"+ komponendid[i]) 
         nimi = nimi + komponendid[i][1:].strip() 
         i+=1 
       else : 
         #print("4"+ komponendid[i]) 
         nimi = nimi + komponendid[i].strip() + " " 
         i+=1 
     return nimi 

# Use: aNimed(row, sheetNr, lang) 
sheet = book.sheet_by_index(7) 
f= open('data.txt', 'w') 
for row_index in range (1, sheet.nrows): 
    #print(aNimed(row_index, 5, 'en')) 
    f.write(aNimed(row_index, 1, 'ru')+ '\n') 
    #safe_write(f, aNimed(row_index, 1, 'ru')) 
f.close() 
+0

請問如果使用Unicode字符串,而不是字節字符串工作的呢? (即'u'\ n''等) –

+0

我不認爲我理解我應該如何使用它們...... – kyng

+0

將每個字符串從''''或'''''改爲'u'''和' U「」'讓您使用Unicode字符串,看看是否出現其他錯誤。 – User

回答

0

這不是特別優雅,但我想我有一個解決辦法。從csv文件讀取而不是從excel文件讀取。例如,

`import csv 
data = [] 
opened_file = open(csv_filename, 'rb') 
reader = csv.reader(opened_file) 
for row in reader: 
    data.append(row) 
opened_file.close()` 

現在,你有你的數據保存爲一個列表。做翻譯並將其保存爲不同的列表,translate_data。現在,這是關鍵,你可以打開一個新的工作簿

`from xlwt import Workbook 
book = Workbook(encoding="utf8") 
foo = book.add_sheet("foo") 
for row_num in range(len(translated_data)): 
    for col_num in range(len(translated_data[row_num]): 
     foo.write(row_num, col_num, translated_data[row_num][col_num] 
book.save("filename.xls")` 

的關鍵是,如果你使用工作簿(),您可以指定編碼,但如果使用open_workbook(),它看起來像你'用ascii卡住了。

相關問題