我正在使用Lucene 6.3,但我無法弄清楚以下非常基本的搜索查詢有什麼問題。它只是添加到具有單個日期範圍的文檔中,然後嘗試在更大範圍上搜索應找到這兩個文檔。哪裏不對?使用Lucene空間搜索日期範圍查詢/ DateRangePrefixTree?
有內嵌評論應該使exmaple相當自我解釋。如果有什麼不清楚,請告訴我。
請注意,我的主要要求是能夠沿着側其他領域的查詢執行日期範圍查詢,如
text:interesting date:[2014 TO NOW]
這是看Lucene spatial deep dive video介紹這介紹了其DateRangePrefixTree和戰略框架後根據。
Rant:感覺就像我在這裏犯了什麼錯誤,我應該在查詢或寫作時得到一些驗證錯誤,因爲我的例子有多簡單。
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.document.TextField;
import org.apache.lucene.index.*;
import org.apache.lucene.queryparser.classic.ParseException;
import org.apache.lucene.queryparser.classic.QueryParser;
import org.apache.lucene.search.*;
import org.apache.lucene.spatial.prefix.NumberRangePrefixTreeStrategy;
import org.apache.lucene.spatial.prefix.PrefixTreeStrategy;
import org.apache.lucene.spatial.prefix.tree.DateRangePrefixTree;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.RAMDirectory;
import org.junit.Before;
import org.junit.Test;
import java.io.IOException;
import java.util.Calendar;
import java.util.Date;
public class TestLuceneDatePrefix {
/*
All these names should be lower case as field names are case sensitive in Lucene.
*/
private static final String NAME = "name";
public static final String TIME = "time";
private Directory directory;
private StandardAnalyzer analyzer;
private ScoreDoc lastDocOnPage;
private IndexWriterConfig indexWriterConfig;
@Before
public void setup() {
analyzer = new StandardAnalyzer();
directory = new RAMDirectory();
indexWriterConfig = new IndexWriterConfig(analyzer);
}
@Test
public void testAddDocumentAndSearchByDate() throws IOException {
IndexWriter w = new IndexWriter(directory, new IndexWriterConfig(analyzer));
// Responsible for creating the prefix string/geohash/token to identify the date.
// aka Create post codes
DateRangePrefixTree prefixTree = new DateRangePrefixTree(DateRangePrefixTree.JAVA_UTIL_TIME_COMPAT_CAL);
// Strategy indexing the token.
// aka transform post codes into tokens that make them efficient to search.
PrefixTreeStrategy strategy = new NumberRangePrefixTreeStrategy(prefixTree, TIME);
createDocument(w, "Bill", new Date(2017,1,1), prefixTree, strategy);
createDocument(w, "Ted", new Date(2018,1,1), prefixTree, strategy);
w.close();
// Written the document, now try query them
DirectoryReader reader;
try {
QueryParser queryParser = new QueryParser(NAME, analyzer);
System.out.println(queryParser.getLocale());
// Surely searching only on year for the easiest case should work?
Query q = queryParser.parse("time:[1972 TO 4018]");
// The following query returns 1 result, so Lucene is set up.
// Query q = queryParser.parse("name:Ted");
reader = DirectoryReader.open(directory);
IndexSearcher searcher = new IndexSearcher(reader);
TotalHitCountCollector totalHitCountCollector = new TotalHitCountCollector();
int hitsPerPage = 10;
searcher.search(q, hitsPerPage);
TopDocs docs = searcher.search(q, hitsPerPage);
ScoreDoc[] hits = docs.scoreDocs;
// Hit count is zero and no document printed!!
// Putting a dependency on mockito would make this code harder to paste and run.
System.out.println("Hit count : "+hits.length);
for (int i = 0; i < hits.length; ++i) {
System.out.println(searcher.doc(hits[i].doc));
}
reader.close();
}
catch (ParseException e) {
e.printStackTrace();
}
}
private void createDocument(IndexWriter w, String name, Date fromDate, DateRangePrefixTree prefixTree, PrefixTreeStrategy strategy) throws IOException {
Document doc = new Document();
// Store a text/stored field for the name. This helps indicate that Lucene is orking.
doc.add(new TextField(NAME, name, Field.Store.YES));
//offset toDate
Calendar cal = Calendar.getInstance();
cal.setTime(fromDate);
cal.add(Calendar.DATE, 1);
Date toDate = cal.getTime();
// This lets the prefix tree create whatever tokens it needs
// perhaps index year, date, second etc separately, hence multiple potential tokens.
for (IndexableField field : strategy.createIndexableFields(prefixTree.toRangeShape(
prefixTree.toUnitShape(fromDate), prefixTree.toUnitShape(toDate)))) {
// Debugging the tokens produced is difficult as I can't intuitively look at them and know if they are valid.
doc.add(field);
}
w.addDocument(doc);
}
}
更新:
我想也許答案是使用SimpleAnalyzer相比StandardAnalyzer,但是這似乎並沒有擦出火花。
我能夠解析用戶日期範圍的要求似乎是catered by SOLR,所以我期望這是基於Lucene功能。
我想也許答案是使用SimpleAnalyzer與StandardAnalyzer相比,但這似乎並不奏效。 –