2012-10-04 98 views
0

我想查詢使用Python 2.7和pymongo-2.3採用這樣的MongoDB數據庫:的UnicodeDecodeError在遍歷集合的MongoDB

from pymongo import Connection 

connection = Connection() 
db = connection['db-name'] 
collections = db.subName 
entries = collections['collection-name'] 
print entries 
# > Collection(Database(Connection('localhost', 27017), u'db-name'), u'subName.collection-name') 

for entry in entries.find(): 
    pass 

迭代器失敗,即使我不與entry做任何事對象:

Traceback (most recent call last): 
File "/Users/../mongo.py", line 27, in <module> 
    for entry in entries.find(): 
File "/Library/Python/2.7/site-packages/pymongo-2.3-py2.7-macosx-10.8-intel.egg/pymongo/cursor.py", line 778, in next 
File "/Library/Python/2.7/site-packages/pymongo-2.3-py2.7-macosx-10.8-intel.egg/pymongo/cursor.py", line 742, in _refresh 
File "/Library/Python/2.7/site-packages/pymongo-2.3-py2.7-macosx-10.8-intel.egg/pymongo/cursor.py", line 686, in __send_message 
File "/Library/Python/2.7/site-packages/pymongo-2.3-py2.7-macosx-10.8-intel.egg/pymongo/helpers.py", line 111, in _unpack_response 
UnicodeDecodeError: 'utf8' codec can't decode byte 0xfc in position 744: invalid start byte 

我不是我試圖查詢的數據庫的創建者。 有沒有人知道我在做什麼錯,我該如何解決?謝謝。


更新:我設法使用try-exceptpymongo/helpers.py跳過出錯行,但我寧願不涉及數據丟失的解決方案。

try: 
    result["data"] = bson.decode_all(response[20:], as_class, tz_aware, uuid_subtype) 
except: 
    result["data"] = [] 

回答

2

你能使用蒙戈外殼嘗試相同的操作?我想弄清楚它是Python特定的還是數據庫中的損壞:

$ mongo db-name 
> var collection = db.getCollection('subName.collection-name') 
> collection.find().forEach(function(doc) { printjson(doc); }) 
+0

對不起。爲了加快速度,嘗試過'''''''''''''''''''''''它似乎與Python相關。 –

+0

好的,你可以在Python中做'entry.find()。sort([('_ id',1)]):print entry ['_ id']'?這會在有問題的文檔之前給你提供_id。然後把這個_id放在shell中,'collection.findOne({_ id:{$ gt:my_id}})'並在這裏發佈。 –

+0

奇怪的是,這現在在控制檯(也在Python中)失敗:''collection.findOne({_ id:{$ gt:ObjectId(「4ebcd5f0ed7c5031a103ba68」)}})'我不知道爲什麼我沒有抓住第一個我試過了。 '解碼失敗。可能無效utf-8 string' - 一堆垃圾 - '爲什麼:TypeError:UTF-8字符0x1234567太大 src/mongo/shell/utils.js:1018' –