2016-03-22 57 views
1

我有一個MySQL表,存儲在一個LONGTEXT場XML內容,編碼爲u​​tf8mb4_general_ciPython中,XML和MySQL - ASCII v UTF8編碼問題

數據庫表 enter image description here 我想用一個Python腳本來讀取在來自轉錄字段的XML數據中,修改一個元素,然後將該值寫回數據庫。

當我嘗試使用ElementTree.tostring獲得XML內容到一個元素中,我得到以下編碼錯誤:

Traceback (most recent call last): 
File "ImageProcessing.py", line 33, 
    in <module> root = etree.fromstring(row[1]) 
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/xml/etre‌​e/ElementTree.py", line 1300, 
    in XML parser.feed(text) 
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/xml/etre‌​ e/ElementTree.py", line 1640, 
    in feed self._parser.Parse(data, 0) 

UnicodeEncodeError: 'ascii' codec can't encode character u'\u2014' in position 9568: ordinal not in range(128) 

代碼:

import datetime 
import mysql.connector 
import xml.etree.ElementTree as etree 

# Creates the config parameters, connects 
# to the database and creates a cursor 
config = { 
    'user': 'username', 
    'password': 'password', 
    'host': '127.0.0.1', 
    'database': 'dbname', 
    'raise_on_warnings': True, 
    'use_unicode': True, 
    'charset': 'utf8', 
} 
cnx = mysql.connector.connect(**config) 
cursor = cnx.cursor() 

# Structures the SQL query 
query = ("SELECT * FROM transcription") 

# Executes the query and fetches the first row 
cursor.execute(query) 
row = cursor.fetchone() 

while row is not None: 
    print(row[0]) 

    #Some of the things I have tried to resolve the encoding issue 
    #parser = etree.XMLParser(encoding="utf-8") 
    #root = etree.fromstring(row[1], parser=parser) 
    #row[1].encode('ascii', 'ignore') 

    #Line where the encoding error is being thrown 
    root = etree.fromstring(row[1]) 

    for img in root.iter('img'): 
     refno = img.text 
     img.attrib['href']='http://www.link.com/images.jsp?doc=' + refno 
     print img.tag, img.attrib, img.text 

    row = cursor.fetchone() 

cursor.close() 
cnx.close() 
+0

請提供錯誤 –

+0

西莫的全堆棧跟蹤,我們在增加了原來的問題範圍的風險。您應該爲新問題創建一個新問題,如果您認爲它解決了原始問題,請提出並接受我的問題。 –

+0

我現在已經上傳了一個新的問題,我會更新這個以恢復到原來的範圍 –

回答

0

你就擁有了一切以及設置和數據庫連接正在返回Unicodes,這是一件好事。

不幸的是,ElementTree的fromstring()要求字節str而不是Unicode。這樣ElementTree就可以使用XML標頭中定義的編碼對其進行解碼。

您需要改用此:

utf_8_xml = row[1].encode("utf-8") 
root = etree.fromstring(utf_8_xml) 
+0

嘗試upvote,但我沒有15點聲望點:( –