2012-08-02 24 views
3

我想構建自己的使用過濾器/標記器的分析器。關鍵字分析器和LowerCaseFilter/LowerCaseTokenizer

我的意思是,相同的字段是關鍵字(整個流作爲一個單一令牌)和小寫

如果KeywordAnalyzer只使用,字段的值保持不區分大小寫的。 如果我使用LowerCaseTokenizerLowerCaseFilter我必須把它們與做同樣的事情KeywordAnalyzer其他分析儀結合起來(通過無字母分開,用空格,刪除停止的話,等)

的問題是:有任何方式使該字段作爲關鍵字(整個流作爲一個單一的令牌)和小寫使用過濾器或分析器Lucene或tokenizers?

(谷歌翻譯,比較遺憾的錯誤)

回答

5

這應該工作:

public final class YourAnalyzer extends ReusableAnalyzerBase { 

    @Override 
    protected TokenStreamComponents createComponents(final String fieldName, final Reader reader) { 
    final TokenStream source = new KeywordTokenizer(reader); 
    return new TokenStreamComponents(source, new LowercaseFilter(Version.LUCENE_36, source)); 
    } 
} 
1

在Lucene的3.6.2它必須是這樣的:

import org.apache.lucene.analysis.KeywordAnalyzer; 
import org.apache.lucene.analysis.KeywordTokenizer; 
import org.apache.lucene.analysis.LowerCaseFilter; 
import org.apache.lucene.analysis.LowerCaseTokenizer; 
import org.apache.lucene.analysis.ReusableAnalyzerBase; 
import org.apache.lucene.analysis.Tokenizer; 
import org.apache.lucene.util.Version; 

public class YourAnalyzer extends ReusableAnalyzerBase { 

    private final Version version; 

    public YourAnalyzer(final Version version) { 
     super(); 
     this.version = version; 
    } 

    @Override 
    protected TokenStreamComponents createComponents(final String fieldName, final Reader reader) { 
     final Tokenizer source = new KeywordTokenizer(reader); 
     return new TokenStreamComponents(source, new LowerCaseFilter(this.version, source)); 
    } 

}