2010-07-03 40 views
2

在Windows上,Google App Engine devserver 1.3.5和Python 2.5.4出現奇怪的錯誤。Google App Engine:UnicodeDecode批量數據上傳中的錯誤

樣本行的CSV:

EQS,550,foobar,"<some><html><garbage /></html></some>",odp,Ti4=,http://url.com,success 

錯誤:

..................................................................................................................[ERROR ] [Thread-1] WorkerThread: 
Traceback (most recent call last): 
    File "C:\Program Files\Google\google_appengine\google\appengine\tools\adaptive_thread_pool.py", line 150, in WorkOnItems 
    status, instruction = item.PerformWork(self.__thread_pool) 
    File "C:\Program Files\Google\google_appengine\google\appengine\tools\bulkloader.py", line 695, in PerformWork 
    transfer_time = self._TransferItem(thread_pool) 
    File "C:\Program Files\Google\google_appengine\google\appengine\tools\bulkloader.py", line 852, in _TransferItem 
    self.request_manager.PostEntities(self.content) 
    File "C:\Program Files\Google\google_appengine\google\appengine\tools\bulkloader.py", line 1296, in PostEntities 
    datastore.Put(entities) 
    File "C:\Program Files\Google\google_appengine\google\appengine\api\datastore.py", line 282, in Put 
    req.entity_list().extend([e._ToPb() for e in entities]) 
    File "C:\Program Files\Google\google_appengine\google\appengine\api\datastore.py", line 687, in _ToPb 
    properties = datastore_types.ToPropertyPb(name, values) 
    File "C:\Program Files\Google\google_appengine\google\appengine\api\datastore_types.py", line 1499, in ToPropertyPb 
    pbvalue = pack_prop(name, v, pb.mutable_value()) 
    File "C:\Program Files\Google\google_appengine\google\appengine\api\datastore_types.py", line 1322, in PackString 
    pbvalue.set_stringvalue(unicode(value).encode('utf-8')) 
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe8 in position 36: ordinal not in range(128) 
[INFO ] Unexpected thread death: Thread-1 
[INFO ] An error occurred. Shutting down... 
..[ERROR ] Error in Thread-1: 'ascii' codec can't decode byte 0xe8 in position 36: ordinal not in range(128) 

正在被一個問題所產生的錯誤有一個base64字符串,其中有一個每行?

KGxwMAoobHAxCihTJ0JJT0VFJwpwMgpJMjYxMAp0cDMKYWEu 

KGxwMAoobHAxCihTJ01BVEgnCnAyCkkyOTQwCnRwMwphYS4= 

數據裝載:

class CourseLoader(bulkloader.Loader): 
    def __init__(self): 
     bulkloader.Loader.__init__(self, 'Course', 
            [('dept_code', str), 
            ('number', int), 
            ('title', str), 
            ('full_description', str), 
            ('unparsed_pre_reqs', str), 
            ('pickled_pre_reqs', lambda x: base64.b64decode(x)), 
            ('course_catalog_url', str), 
            ('parse_succeeded', lambda x: x == 'success') 
            ]) 

loaders = [CourseLoader] 

有沒有辦法從哪一行導致錯誤的回溯講?

UPDATE:它看起來像有兩個字符導致錯誤:è®。我怎樣才能讓Google App Engine處理它們?

+0

我會嘗試在GAE查找該代碼並添加跟蹤/日誌記錄信息。 – 2010-07-03 06:14:46

回答

0

看起來像CSV的一些行有一些非ASCII數據(例如LATIN SMALL LETTER E WITH GRAVE - 這就是0xe8將在ISO-8859-1中),然而你將它映射到str(應該是unicode,我相信CSV應該在utf-8中)。

找到,如果一個文本文件中的任何行有非ASCII數據,一個簡單的Python代碼段會有所幫助,如:

>>> f = open('thefile.csv') 
>>> prob = [] 
>>> for i, line in enumerate(f): 
... try: unicode(line) 
... except: prob.append(i) 
... 
>>> print 'Problems in %d lines:' % len(prob) 
>>> print prob 
+0

看起來你是對的。我如何需要一個不同的數據存儲屬性來保存這樣的值? – 2010-07-03 17:17:39

+0

@Rosarch,datastore'StringProperty'和'TextProperty'完美地保存了unicode對象(後者通過unicode的'Text'子類),如http://code.google.com/appengine/docs/python/datastore /typesandpropertyclasses.html#Text。問題是你的代碼中使用'str' - 應該是'unicode',並且使用CSV編碼(我相信這裏的「正確」編碼是utf-8)。沒有「不同的數據存儲屬性」是必要的。 – 2010-07-03 17:26:17

相關問題