我沒有嘗試使用的Neo4j與lucene,但作爲替代方案,您可以使用RAMDirectory。
val analyzer = new StandardAnalyzer(Version.LUCENE_43)
val index = new RAMDirectory()
val config = new IndexWriterConfig(Version.LUCENE_43, analyzer)
然後在Lucene的推出,你可以將數據添加到索引:
mkIndex(xs: Iterable[Articles])
指數包含的文件:
def mkIndex(xs: Iterable[Articles]) {
def withWriter[T](f: IndexWriter => T): T = {
val iw = new IndexWriter(index, config)
Try(f(iw)) match {
case Success(_) => iw.close()
case Failure(e) => // do something with exception
}
withWriter { _.addDocuments(xs.map(mkDoc)) }
}
所以我們需要的文件:
def mkDoc(art: Article): Document = make(new Document) { doc =>
doc add TextField("id", art.id.toString)
doc add TextField("data", art.content)
doc add TextField("author", art.author)
}
所以當索引準備就緒時需要搜索功能:
/**
* id - your article ID,
* field - the default field for query terms
* lim - limit results
*/
def search(id: String, field: String, lim: Int): Seq[Article] = {
val reader = DirectoryReader.open(index)
val searcher = new IndexSearcher(reader)
val collector = TopScoreDocCollector.create(lim, true)
val q = new QueryParser(Version.LUCENE_43, field, analyzer).parse(id)
searcher.search(q, collector)
val hits = collector.topDocs().scoreDocs
val results = hits map { hit => searcher doc hit.doc }
reader.close()
results map { doc => Article(doc.get("id"), doc.get("data"), doc.get("author")) }
}
使用此搜索功能,您可以進行模糊搜索或通配符搜索。
這不是使用Neo4j的最佳實踐的直接答案,而是另一種觀點。它在小型AWS機器上在不到一秒的時間內完成50k文檔中的模糊搜索。