2016-04-28 46 views
2

的順序,我在Lucene的6.0.0使用org.apache.lucene.queryparser.classic.QueryParser解析使用CustomAnalyzer查詢,如下圖所示:的QueryParser與CustomAnalyzer食堂使用的PatternReplaceCharFilter

public static void testFilmAnalyzer() throws IOException, ParseException { 
    CustomAnalyzer nameAnalyzer = CustomAnalyzer.builder() 
      .addCharFilter("patternreplace", 
        "pattern", "(movie|film|picture).*", 
        "replacement", "") 
      .withTokenizer("standard") 
      .build(); 

    QueryParser qp = new QueryParser("name", nameAnalyzer); 
    qp.setDefaultOperator(QueryParser.Operator.AND); 
    String[] strs = {"avatar film fiction", "avatar-film fiction", "avatar-film-fiction"}; 

    for (String str : strs) { 
     System.out.println("Analyzing \"" + str + "\":"); 
     showTokens(str, nameAnalyzer); 
     Query q = qp.parse(str); 
     System.out.println("Parsed query of \"" + str + "\":"); 
     System.out.println(q + "\n"); 
    } 
} 

private static void showTokens(String text, Analyzer analyzer) throws IOException { 
    StringReader reader = new StringReader(text); 
    TokenStream stream = analyzer.tokenStream("name", reader); 
    CharTermAttribute term = stream.addAttribute(CharTermAttribute.class); 
    stream.reset(); 
    while (stream.incrementToken()) { 
     System.out.print("[" + term.toString() + "]"); 
    } 
    stream.close(); 
    System.out.println(); 
} 

我得到下面的輸出,當調用testFilmAnalyzer

Analyzing "avatar film fiction": 
[avatar] 
Parsed query of "avatar film fiction": 
+name:avatar +name:fiction 

Analyzing "avatar-film fiction": 
[avatar] 
Parsed query of "avatar-film fiction": 
+name:avatar +name:fiction 

Analyzing "avatar-film-fiction": 
[avatar] 
Parsed query of "avatar-film-fiction": 
name:avatar 

這似乎是分析儀使用其正確的預期順序PatternReplaceCharFilter(即前標記化),而QueryParser這樣做之後。有沒有人對此有過解釋?這不是一個錯誤?

回答

1

不,這不是一個錯誤。查詢過濾器始終在標記之前應用,無論是在查詢時間還是索引時間。

但是,空格在QueryParser語法中有意義,它完全獨立於分析。將查詢的空格分開,每個子句自行分析。如果不依賴默認字段,這很容易看出,在這種情況下,我們需要將查詢重寫爲:avatar-film fiction,即:name:avatar-film name:fiction。分別分析「avatar-film」和「fiction」這兩個條款中的每一個,從而導致您看到的結果。

使用try短語查詢:

String[] strs = {"\"avatar film fiction\"", "\"avatar-film fiction\"", "\"avatar-film-fiction\""}; 

,你應該看到您所期待的結果。