你在做什麼錯誤是假設Lucene的built-in transactional capabilities具有與典型關係數據庫相媲美的性能和保證,當時爲they really don't。更具體地說,在您的情況下,提交會將所有索引文件與磁盤同步,從而使提交時間與索引大小成比例。這就是爲什麼你的indexWriter.commit()
需要越來越多的時間。該Javadoc爲IndexWriter.commit()
甚至警告說:
這可能是一個代價高昂的操作,所以你應該在你的應用程序 測試成本並做到這一點只有在真正必要的。
你能想象數據庫文檔告訴你避免提交嗎?
因爲你的主要目標似乎是保持數據庫更新可見通過的Lucene搜索及時,改善這種狀況,請執行以下操作:
- 有
indexWriter.deleteDocuments(..)
後indexWriter.addDocument(..)
觸發一個成功的數據庫提交,而不是以前
- 執行
indexWriter.commit()
週期性,而不是每一筆交易的,只是爲了確保您的更改最終會寫入磁盤
- 使用
SearcherManager
用於搜索和定期調用maybeRefresh()
到在合理的時間範圍內查看更新的文檔
以下是演示如何通過定期執行maybeRefresh()
來檢索文檔更新的示例程序。它建立100000個文檔索引,使用ScheduledExecutorService
設置定期調用commit()
和maybeRefresh()
,提示您更新單個文檔,然後重複搜索,直到更新可見。所有資源都在程序終止時正確清理。請注意,更新變爲可見時的控制因素是調用maybeRefresh()
時,而不是commit()
。
import java.io.IOException;
import java.nio.file.Paths;
import java.util.Scanner;
import java.util.concurrent.*;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.*;
import org.apache.lucene.index.*;
import org.apache.lucene.search.*;
import org.apache.lucene.store.FSDirectory;
public class LucenePeriodicCommitRefreshExample {
ScheduledExecutorService scheduledExecutor;
MyIndexer indexer;
MySearcher searcher;
void init() throws IOException {
scheduledExecutor = Executors.newScheduledThreadPool(3);
indexer = new MyIndexer();
indexer.init();
searcher = new MySearcher(indexer.indexWriter);
searcher.init();
}
void destroy() throws IOException {
searcher.destroy();
indexer.destroy();
scheduledExecutor.shutdown();
}
class MyIndexer {
IndexWriter indexWriter;
Future commitFuture;
void init() throws IOException {
indexWriter = new IndexWriter(FSDirectory.open(Paths.get("C:\\Temp\\lucene-example")), new IndexWriterConfig(new StandardAnalyzer()));
indexWriter.deleteAll();
for (int i = 1; i <= 100000; i++) {
add(String.valueOf(i), "whatever " + i);
}
indexWriter.commit();
commitFuture = scheduledExecutor.scheduleWithFixedDelay(() -> {
try {
indexWriter.commit();
} catch (IOException e) {
e.printStackTrace();
}
}, 5, 5, TimeUnit.MINUTES);
}
void add(String id, String text) throws IOException {
Document doc = new Document();
doc.add(new StringField("id", id, Field.Store.YES));
doc.add(new StringField("text", text, Field.Store.YES));
indexWriter.addDocument(doc);
}
void update(String id, String text) throws IOException {
indexWriter.deleteDocuments(new Term("id", id));
add(id, text);
}
void destroy() throws IOException {
commitFuture.cancel(false);
indexWriter.close();
}
}
class MySearcher {
IndexWriter indexWriter;
SearcherManager searcherManager;
Future maybeRefreshFuture;
public MySearcher(IndexWriter indexWriter) {
this.indexWriter = indexWriter;
}
void init() throws IOException {
searcherManager = new SearcherManager(indexWriter, true, null);
maybeRefreshFuture = scheduledExecutor.scheduleWithFixedDelay(() -> {
try {
searcherManager.maybeRefresh();
} catch (IOException e) {
e.printStackTrace();
}
}, 0, 5, TimeUnit.SECONDS);
}
String findText(String id) throws IOException {
IndexSearcher searcher = null;
try {
searcher = searcherManager.acquire();
TopDocs topDocs = searcher.search(new TermQuery(new Term("id", id)), 1);
return searcher.doc(topDocs.scoreDocs[0].doc).getField("text").stringValue();
} finally {
if (searcher != null) {
searcherManager.release(searcher);
}
}
}
void destroy() throws IOException {
maybeRefreshFuture.cancel(false);
searcherManager.close();
}
}
public static void main(String[] args) throws IOException {
LucenePeriodicCommitRefreshExample example = new LucenePeriodicCommitRefreshExample();
example.init();
Runtime.getRuntime().addShutdownHook(new Thread() {
@Override
public void run() {
try {
example.destroy();
} catch (IOException e) {
e.printStackTrace();
}
}
});
try (Scanner scanner = new Scanner(System.in)) {
System.out.print("Enter a document id to update (from 1 to 100000): ");
String id = scanner.nextLine();
System.out.print("Enter what you want the document text to be: ");
String text = scanner.nextLine();
example.indexer.update(id, text);
long startTime = System.nanoTime();
String foundText;
do {
foundText = example.searcher.findText(id);
} while (!text.equals(foundText));
long elapsedTimeMillis = TimeUnit.NANOSECONDS.toMillis(System.nanoTime() - startTime);
System.out.format("it took %d milliseconds for the searcher to see that document %s is now '%s'\n", elapsedTimeMillis, id, text);
} catch (Exception e) {
e.printStackTrace();
} finally {
System.exit(0);
}
}
}
本示例已成功通過Lucene 5.3.1和JDK 1.8.0_66測試。
你能解決它與背景任務?你可能會受到10秒的處罰,但對許多應用程序來說可以這麼做 – AdamSkywalker
@AdamSkywalker - 但它變得越來越慢,什麼時候需要1小時,10小時或2天? –