2012-05-25 28 views
0

我生成的自定義XML文件必須採用此腳本的某種格式。它查詢數據庫並將結果變成一個大的XML文件。我這樣做的多個數據庫範圍從庫存零件清單到員工記錄。將MySQL查詢轉換爲XML時的Python編碼錯誤

import csv 
import StringIO 
import time 
import MySQLdb 
import lxml.etree 
import lxml.builder 
from datetime import datetime 
import string 
from lxml import etree 
from lxml.builder import E as buildE 
from datetime import datetime 
from time import sleep 
import shutil 
import glob 
import os 
import logging 

def logWrite(message): 
    logging.basicConfig(
     filename="C:\\logs\\XMLSyncOut.log", 
     level=logging.DEBUG, 
     format='%(asctime)s %(message)s', 
     datefmt='%m/%d/%Y %I:%M:%S: %p' 
    ) 
    logging.debug(message) 


def buildTag(tag,parent=None,content=None): 
     element = buildE(tag) 
     if content is not None: 
       element.text = unicode(content) 
     if parent is not None: 
       parent.append(element) 
     return element 

def fetchXML(cursor): 
     logWrite("constructing XML from cursor") 
     fields = [x[0] for x in cursor.description] 
     doc = buildTag('DATA') 
     for record in cursor.fetchall(): 
       r = buildTag('ROW',parent=doc) 
       for (k,v) in zip(fields,record): 
         buildTag(k,content=v,parent=r) 
     return doc 

def updateDatabase 1(): 
     try: 
       conn = MySQLdb.connect(host = 'host',user = 'user',passwd = 'passwd',db = 'database') 
       cursor = conn.cursor() 

     except: 
       sys.exit(1) 
       logWrite("Cannot connect to database - quitting!") 

     cursor.execute("SELECT * FROM database.table") 
     logWrite("Dumping fields from database.table into cursor")     
     xmlFile = open("results.xml","w") 
     doc = fetchXML(cursor) 
     xmlFile.write(etree.tostring(doc,pretty_print=True)) 
     logWrite("Writing XML results.xml") 
     xmlFile.close() 

出於某種原因,新數據庫我從Excel電子表格導入由具有某種類型的編碼錯誤,其他人不具有中的一個。這是錯誤

element.text = unicode(content) 
UnicodeDecodeError: 'ascii' codec can't decode byte 0x96 in position 21: ordinal not in range(128) 

我試圖明確改變buildTag功能看起來像這樣編碼爲ASCII:

def buildTag(tag,parent=None,content=None): 
     element = buildE(tag) 
     if content is not None: 
      content = str(content).encode('ascii','ignore') 
      element.text = content 
     if parent is not None: 
       parent.append(element) 
     return element 

這仍然沒有奏效。

關於我能做些什麼來阻止它的任何想法?我無法逃避它們,因爲我不能在記錄中顯示「\ x92」作爲輸出。

+0

你應該爲MySQL設置連接的編碼。執行'SET NAMES'UTF8''(或任何適合你的編碼)。請參閱[手冊](http://dev.mysql.com/doc/refman/5.0/en/charset-connection.html)瞭解更多信息。 –

回答

0

我覺得你的問題在Windows編碼,你可以在外殼嘗試:

In: print '\x92'.decode('cp1251') 
Out: ' 
0

我專注於

element.text = unicode(content) 
UnicodeDecodeError: 'ascii' codec can't decode byte 0x96 in position 21: ordinal not in range(128) 

我假設contentstr型的,也就是說,它包含字節代碼(僅適用於Python 2)。您必須知道使用哪種編碼來生成此字節碼。然後,爲了從這個字節碼創建一個unicode對象,你必須明確地告訴Python如何解碼它,例如:

element.text = content.decode("utf-8")