Lucene GermanAnalyzer以不同的方式處理不同的輸入

德國分析器有問題。我只需要它來搜索名稱。假設我在我的桌子上有文件{「Muller」，「Mueller」，「Müller」}。現在，如果我使用Lucene GermanAnalyzer以不同的方式處理不同的輸入

Analyzer analyzer = new GermanAnalyzer(Version.LUCENE_43); 
    String querystr = "Muller~0.1" 
    Query q = new QueryParser(Version.LUCENE_43, "Name", analyzer).parse(querystr);

它expectedky返回所有的文件。但是，當我嘗試

String querystr = "Müller~0.1"

或

String querystr = "Mueller~0.1"

它不返回任何東西。我不知道我是否錯過了一些東西，或者它是否是一個錯誤。我不認爲這種編碼是問題，因爲在搜索「Mueller」時，只有正常的字符在使用。任何意見表示讚賞。

來源

2013-06-13 JaKu

我使用GermanNormalizationFilter(String string)來標準化搜索到的字符串，從而解決了這個問題。

public static List<String> tokenizeString(String string) throws IOException { 
     List<String> result = new ArrayList<String>(); 
     Tokenizer source = new StandardTokenizer(version, new StringReader(string)); 
     TokenStream stream = new StandardFilter(version, source); 
     stream = new GermanNormalizationFilter(stream); 
     CharTermAttribute cattr = stream.addAttribute(CharTermAttribute.class); 
     stream.reset(); 
     while (stream.incrementToken()) { 
      result.add(cattr.toString()); 
     } 
     return result; 
    }

來源

2013-06-18 15:05:11 JaKu

Lucene GermanAnalyzer以不同的方式處理不同的輸入

回答

相關問題