Lucene中的多個詞查詢

例如：在Lucene文檔中有一列「描述」。假設「描述」的內容是[hello foo bar]。我想要一個查詢[hello f]，那麼文件應該被命中，[hello ff] 或[hello b]不應該被命中。Lucene中的多個詞查詢

我用編程的方式來創建Query，如PrefixQuery，TermQuery加入BooleanQuery，但預期他們不工作。使用StandardAnalyzer。

測試用例：

一）：new PrefixQuery(new Term("description", "hello f")) - > 0打

B）：PhraseQuery query = new PhraseQuery(); query.add(new Term("description", "hello f*")) - > 0打

C）：PhraseQuery query = new PhraseQuery(); query.add(new Term("description", "hello f")) - > 0打

任何建議？謝謝！

來源

2012-12-17 盧聲遠 Shengyuan Lu

你有什麼試過了？你能顯示一些代碼片段嗎？這會幫助我們更好地理解你的問題。 –

您是否嘗試過使用org.apache.lucene.queryParser.QueryParse來解析查詢字符串，比如「description：hello and description：f *」？ – pabrantes

@pabrantes「描述：你好和描述：f *」不是我所期望的，我想要「你好」，然後是「f」。 –

嘗試使用Ngram或EdgeNgram，而索引？

http://lucene.apache.org/core/old_versioned_docs/versions/2_9_0/api/all/org/apache/lucene/analysis/ngram/NGramTokenizer.html

來源

2012-12-17 09:32:28 rrsk

它不工作，因爲你正在傳遞多個方面一個Term對象。如果你希望你所有的搜索詞是前綴發現，你需要：

令牌化輸入字符串與分析，它會分裂搜索文本「你好F」到「你好」和「f」：

TokenStream tokenStream = analyzer.tokenStream（null，new StringReader（searchText））; CharTermAttribute termAttribute = tokenStream.getAttribute（CharTermAttribute.class）;

List tokens = new ArrayList（）; （tokenStream.incrementToken（））{ tokens.add（termAttribute.toString（））; }
把每個令牌到Term對象這反過來又需要放在PrefixQuery和所有PrefixQueries到BooleanQuery

編輯：比如像這樣：

BooleanQuery booleanQuery = new BooleanQuery(); 

for(String token : tokens) {   
    booleanQuery.add(new PrefixQuery(new Term(fieldName, token)), Occur.MUST); 
}

來源

2012-12-17 10:09:48

謝謝亞當！我已經用你的第一種方法來分析儀。但第二種方式不是預期的方式。 –

@盛源請顯示您的當前代碼 –

Lucene中的多個詞查詢

回答

相關問題