2016-05-18 17 views
1

我在自定義分析器實現的createComponents實現中使用了HTMLStripCharFilter,但HTML沒有從內容中剝離。請在代碼下方找到。HTMLStripCharFilter在自定義分析器的createComponent實現中不起作用

@Override 
    protected TokenStreamComponents createComponents(String fieldName) 
    { 
     StandardTokenizer source = new StandardTokenizer(); 
     source.setReader(mStripHTML ? new HTMLStripCharFilter(getReader()) : getReader()); 
     source.setMaxTokenLength(maxTokenLength); 
     TokenStream result = new StandardFilter(source); 
     result = new LowerCaseFilter(result); 
     return new TokenStreamComponents(source, result); 
    } 

回答

1

你CharFilter不應該在你的createComponents方法來定義,它應該是在initReader:

@Override 
protected Reader initReader(String fieldName, Reader reader) { 
    return mStripHTML ? new HTMLStripCharFilter(reader) : reader; 
} 

@Override 
protected TokenStreamComponents createComponents(String fieldName) 
{ 
    StandardTokenizer source = new StandardTokenizer(); 
    source.setMaxTokenLength(maxTokenLength); 
    TokenStream result = new StandardFilter(source); 
    result = new LowerCaseFilter(result); 
    return new TokenStreamComponents(source, result); 
} 
+0

謝謝..但是,我昨天在lucene文檔中得到了解決方案。 –

相關問題