從語料庫中刪除生成錯誤的文檔

我的計算機上存儲有lsm-db中的一組1000個文檔 - 編碼和壓縮文件。當我嘗試解壓縮和解碼時，出現錯誤，提示「不正確的頭檢查」。從語料庫中刪除生成錯誤的文檔

這是我在做什麼：

for key in my_lsm_db.keys(): 
    print key, zlib.decompress(my_lsm_db[key], zlib.MAX_WBITS|32).decode('utf-8')

處理幾個鍵後，代碼將引發錯誤。我收到的錯誤是：error: Error -3 while decompressing data: incorrect header check

我想從語料庫中刪除所有此類錯誤生成文檔。如何識別生成錯誤的文檔，以便我可以刪除它們？

def remove_docs(my_lsm_db): 
    for key in my_lsm_db.keys(): 
     ## write code that identifies an error when generated 
     if <code that identifies document generating error>: 
      del my_lsm_db[key]

這裏有zlib的一些資料和代碼的MAX_WBITS部分：Zlib Compression，Stack Overflow Answer for Zlib Automatic Header Detection

來源

2017-03-27 Minu

我嘗試使用try /除了在我的代碼塊克服這些誤差產生的文件。它不僅適用於上述代碼，還適用於其他內容。

try: 
    <code to execute> 
except (<list of errors>) as e: 
    print e

來源

2017-04-28 15:39:27 Minu

從語料庫中刪除生成錯誤的文檔

回答

相關問題