2017-03-29 26 views
0

我的腳本將數據從MySQL遷移到mongodb。當沒有unicode列時,它運行得非常好。但是在添加OrgLanguages列時拋出錯誤。python:錯誤處理有unicode數據的有序字典

mongoImp = dbo.insert_many(odbcArray) 
    File "/home/lrsa/.local/lib/python2.7/site-packages/pymongo/collection.py", line 711, in insert_many 
    blk.execute(self.write_concern.document) 
    File "/home/lrsa/.local/lib/python2.7/site-packages/pymongo/bulk.py", line 493, in execute 
    return self.execute_command(sock_info, generator, write_concern) 
    File "/home/lrsa/.local/lib/python2.7/site-packages/pymongo/bulk.py", line 319, in execute_command 
    run.ops, True, self.collection.codec_options, bwc) 
bson.errors.InvalidStringData: strings in documents must be valid UTF-8: 'Portugu\xeas do Brasil, ?????, English, Deutsch, Espa\xf1ol latinoamericano, Polish' 

我的代碼:

import MySQLdb, MySQLdb.cursors, sys, pymongo, collections 

odbcArray=[] 
mongoConStr = '192.168.10.107:36006' 
sqlConnect = MySQLdb.connect(host = "54.175.170.187", user = "testuser", passwd = "testuser", db = "testdb", cursorclass=MySQLdb.cursors.DictCursor) 
mongoConnect = pymongo.MongoClient(mongoConStr) 

sqlCur = sqlConnect.cursor() 
sqlCur.execute("SELECT ID,OrgID,OrgLanguages,APILoginID,TransactionKey,SMTPSpeed,TimeZoneName,IsVideoWatched FROM organizations") 

dbo = mongoConnect.eaedw.mysqlData 
tuples = sqlCur.fetchall() 

for tuple in tuples: 
    odbcArray.append(collections.OrderedDict(tuple)) 

mongoImp = dbo.insert_many(odbcArray) 

sqlCur.close() 
mongoConnect.close() 
sqlConnect.close() 
sys.exit() 

上面的腳本migraates數據時完全沒有試過在SELECT查詢OrgLanguages列。 爲了克服這個問題,我試圖用另一種方式OrderedDict()但給了我一個不同類型的錯誤
改變的代碼:

for tuple in tuples: 
    doc = collections.OrderedDict() 
    doc['oid'] = tuple.OrgID 
    doc['APILoginID'] = tuple.APILoginID 
    doc['lang'] = unicode(tuple.OrgLanguages) 
    odbcArray.append(doc) 
mongoImp = dbo.insert_many(odbcArray) 

錯誤接收:

Traceback (most recent call last): 
    File "pymsql.py", line 19, in <module> 
    doc['oid'] = tuple.OrgID 
AttributeError: 'dict' object has no attribute 'OrgID' 

回答

0

你的MySQL連接返回字符的編碼不同於UTF-8編碼,這是所有BSON字符串必須編碼的編碼。嘗試使用原始編碼,但將charset='utf8'傳遞給MySQLdb.connect