2013-03-19 36 views
3

我使用appengine search api(西班牙語和加泰羅尼亞語頁面,帶有重音字符)對頁面進行爬取和索引。我能夠執行搜索並製作一頁結果。python appengine在搜索API上的unicodeencodeerror snippeted結果

出現問題時,我嘗試使用帶有snipetted_fields查詢對象,因爲它總是產生一個UnicodeEncodeError:

File "/home/otger/python/jobs-gae/src/apps/search/handlers/results.py", line 82, in find_documents 
    return index.search(query_obj) 
    File "/opt/google_appengine_1.7.6/google/appengine/api/search/search.py", line 2707, in search 
    apiproxy_stub_map.MakeSyncCall('search', 'Search', request, response) 
    File "/opt/google_appengine_1.7.6/google/appengine/api/apiproxy_stub_map.py", line 94, in MakeSyncCall 
    return stubmap.MakeSyncCall(service, call, request, response) 
    File "/opt/google_appengine_1.7.6/google/appengine/api/apiproxy_stub_map.py", line 320, in MakeSyncCall 
    rpc.CheckSuccess() 
    File "/opt/google_appengine_1.7.6/google/appengine/api/apiproxy_rpc.py", line 156, in _WaitImpl 
    self.request, self.response) 
    File "/opt/google_appengine_1.7.6/google/appengine/ext/remote_api/remote_api_stub.py", line 200, in MakeSyncCall 
    self._MakeRealSyncCall(service, call, request, response) 
    File "/opt/google_appengine_1.7.6/google/appengine/ext/remote_api/remote_api_stub.py", line 234, in _MakeRealSyncCall 
    raise pickle.loads(response_pb.exception()) 
UnicodeEncodeError: 'ascii' codec can't encode character u'\xf3' in position 52: ordinal not in range(128) 

我發現在計算器一個類似的問題:GAE Full Text Search development console UnicodeEncodeError,但它說,它是一個錯誤固定在1.7.0。使用版本1.7.5和1.7.6時出現同樣的錯誤。

索引頁面時,我添加兩個字段:description和description_ascii。如果我嘗試爲description_ascii生成片段,它可以很好地工作。

這是可能在dev_appserver上生成不ascii內容的片段嗎?

回答

2

我認爲這是一個bug,報告了新的缺陷問題https://code.google.com/p/googleappengine/issues/detail?id=9335

用於開發服務器臨時解決方法 - 通過增加在線查找google.appengine.api.search模塊(search.py​​),和修補功能_DecodeUTF8如果是這樣的:

def _DecodeUTF8(pb_value): 
    """Decodes a UTF-8 encoded string into unicode.""" 
    if pb_value is not None: 
    return pb_value.decode('utf-8') if not isinstance(pb_value, unicode) else pb_value 
    return None 

解決方法 - 直到問題解決實施片斷功能自己 - 假設領域是基礎片斷被稱爲snippet_base

query = search.Query(query_string=query_string, 
       options= 
        search.QueryOptions(
         ... 
         returned_fields= [... 'snippet_base' ...] 
         )) 
results = search.Index(name="<index-name>").search(query) 
if results: 
    for res in results.results: 
     res.snippet = some_snippeting_function(res.field("snippet_base")) 
+0

謝謝你的解決方法 – Otger 2013-05-21 08:49:30