2012-05-11 107 views
1

這似乎是一個常見問題,除了我之前沒有遇到過任何問題以及通常的修復程序無法正常工作。這可能是愚蠢的,但我找不到它。Lucene updateDocument不刪除文檔

我想索引一個yammer網站,因爲yammer api的速度不夠我的目的,問題是當我嘗試使用updateDocument功能更新索引時,舊的不會被刪除。但我有一個未分析的存儲的唯一密鑰。

下面是相關代碼:

Document newdoc = new Document(); 
newdoc.add(new Field(YammerMessageFields.URL, resultUrl, Field.Store.YES, Field.Index.NOT_ANALYZED)); 
newdoc.add(new Field(YammerMessageFields.THREAD_ID, threadID.toString(), Field.Store.YES, Field.Index.NOT_ANALYZED)); 
newdoc.add(new Field(YammerMessageFields.AUTHOR, senderName, Field.Store.YES, Field.Index.ANALYZED)); 
newdoc.add(new Field(YammerMessageFields.CONTENTS, resultText, Field.Store.YES, Field.Index.ANALYZED)); 
Term key = new Term(YammerMessageFields.THREAD_ID, newdoc.getFieldable(YammerMessageFields.THREAD_ID).toString()); 
logger.debug("updating document with key: " + key); 
try { 
    IndexWriter writer = getIndexWriter(); 
    writer.updateDocument(key, newdoc); 
    writer.close(); 
} catch (IOException e) { 
} 

我在日誌中看到的是:

2012-05-11 12:02:29,816 DEBUG [http-8088-2] LuceneIndex - https://www.yammer.com/api/v1/messages/?newer_than=0 
2012-05-11 12:02:38,594 DEBUG [http-8088-2] LuceneIndex - updating document with key: threadid:stored,indexed<threadid:173285202> 
2012-05-11 12:02:45,167 DEBUG [http-8088-2] LuceneIndex - updating document with key: threadid:stored,indexed<threadid:173033239> 
2012-05-11 12:02:51,686 DEBUG [http-8088-2] LuceneIndex - updating document with key: threadid:stored,indexed<threadid:173014568> 
2012-05-11 12:02:51,871 DEBUG [http-8088-2] LuceneIndex - new items:3 

2012-05-11 12:03:27,393 DEBUG [http-8088-2] YammerResource - return all documents 
2012-05-11 12:03:27,405 DEBUG [http-8088-2] YammerResource - nr docs:3 
2012-05-11 12:03:27,405 DEBUG [http-8088-2] YammerResource - nr dels:0 

... 
next update 
... 

2012-05-11 12:03:35,802 DEBUG [http-8088-2] LuceneIndex - https://www.yammer.com/api/v1/messages/?newer_than=0 
2012-05-11 12:03:43,933 DEBUG [http-8088-2] LuceneIndex - updating document with key: threadid:stored,indexed<threadid:173322760> 
2012-05-11 12:03:50,467 DEBUG [http-8088-2] LuceneIndex - updating document with key: threadid:stored,indexed<threadid:173285202> 
2012-05-11 12:03:56,982 DEBUG [http-8088-2] LuceneIndex - updating document with key: threadid:stored,indexed<threadid:173056406> 
2012-05-11 12:04:03,533 DEBUG [http-8088-2] LuceneIndex - updating document with key: threadid:stored,indexed<threadid:173033239> 
2012-05-11 12:04:10,097 DEBUG [http-8088-2] LuceneIndex - updating document with key: threadid:stored,indexed<threadid:173030769> 
2012-05-11 12:04:16,629 DEBUG [http-8088-2] LuceneIndex - updating document with key: threadid:stored,indexed<threadid:173014568> 
2012-05-11 12:04:23,169 DEBUG [http-8088-2] LuceneIndex - updating document with key: threadid:stored,indexed<threadid:173003570> 
2012-05-11 12:04:23,341 DEBUG [http-8088-2] LuceneIndex - new items:7 

2012-05-11 12:05:09,694 DEBUG [http-8088-1] YammerResource - return all documents 
2012-05-11 12:05:09,696 DEBUG [http-8088-1] YammerResource - nr docs:10 
2012-05-11 12:05:09,696 DEBUG [http-8088-1] YammerResource - nr dels:0 

所以按鍵重演(和4周新的),但是當這樣做有我的店裏有10個文件,而不是7個(還有3個刪除的文件)。

編輯:這裏是我如何找到物品,但我實際上展示他們並與盧克一起檢查。

IndexReader r = IndexReader.open(searchIndex.getIndex()); 
       List<Document> docList = new ArrayList<Document>(); 
       List<Document> delList = new ArrayList<Document>(); 

       int num = r.numDocs(); 
       num += r.numDeletedDocs(); 
       for (int i = 0; i < num && i < max; i++) 
       { 
        if (! r.isDeleted(i)) 
         docList.add(r.document(i)); 
        else 
         delList.add(r.document(i)); 

       } 
       r.close(); 
       logger.debug("nr docs:" + docList.size()); 
       logger.debug("nr dels:" + delList.size()); 
+0

使用lucene 3.4.0 btw – Rhand

+0

什麼API調用來查找文檔數量? –

+0

可能。如果調用'maxDoc',已知不會記錄已刪除的文檔。當然,我正在談論Lucene API調用。沒有任何東西超過它。 –

回答

1

我不知道,但不運行一些測試代碼,但是這看起來我錯了:

Term key = new Term(YammerMessageFields.THREAD_ID, 
    newdoc.getFieldable(YammerMessageFields.THREAD_ID).toString()); 

你確定它不應該是:

Term key = new Term(YammerMessageFields.THREAD_ID, 
    newdoc.getFieldable(YammerMessageFields.THREAD_ID).stringValue()); 

然後你繼續使用該密鑰嘗試更新任何匹配的現有文檔。如果密鑰錯誤,那麼推測文檔更新將悄然失敗。我懷疑那Term上的toString()實際上只是給你一個對象引用,這意味着更新將永遠不會工作。

調用toString()用於記錄或調試以外的任何事情(即任何包含邏輯的事情)通常都是錯誤的。

+0

的.stringValue()解決了這個問題。謝謝,奇怪的是,這在另一個版本工作... – Rhand

+0

不,這並不奇怪。程序員經常更改'toString()'方法的實現;這就是爲什麼你永遠不應該依賴他們返回一個特定的價值。 – Jon

+0

順便說一句,更新沒有失敗,文件實際上被添加,它只是失敗的刪除。 – Rhand

相關問題