3
我想使用Lucene(實際上PyLucene!)來找出有多少文檔包含我的確切短語。我的代碼目前看起來像這樣...但它運行得很慢。有誰知道更快的方式來返回文檔計數?Lucene:返回文檔發生短語的最快方法?
phraseList = ["some phrase 1", "some phrase 2"] #etc, a list of phrases...
countsearcher = IndexSearcher(SimpleFSDirectory(File(STORE_DIR)), True)
analyzer = StandardAnalyzer(Version.LUCENE_CURRENT)
for phrase in phraseList:
query = QueryParser(Version.LUCENE_CURRENT, "contents", analyzer).parse("\"" + phrase + "\"")
scoreDocs = countsearcher.search(query, 200).scoreDocs
print "count is: " + str(len(scoreDocs))