對於索引和查詢,我需要執行下面列出的某些轉換。所以我寫了一個自定義過濾器。我如何執行令牌的連接並將其傳遞給NGramFilterFactory過濾器。請告訴我代碼中需要改進的地方。Solr自定義過濾器TokenStream問題
這是Schema.xml文件的配置:
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory" />
<filter class="solr.TrimFilterFactory" />
<filter class="solr.TrimFilterFactory" pattern="([^a-z])" replacement="" replace="all" />
<filter class="solr.StopFilterFactory" ignoreCase="true" words="custom_stop_words.txt"/>
<filter class="intuit.ripple.solr.ConcatFilterFactory" />
<filter class="solr.NGramFilterFactory" minGramSize="3" maxGramSize="3" />
這裏採用的情況下,我試圖解決的一個例子:
1. Input value: "foo Bar Baz qux"
2. WhitespaceTokenizerFactory: "foo", "Bar", "Baz", "qux"
3. LowerCaseFilterFactory: "foo", "bar", "baz", "qux"
4. TrimFilterFactory, TrimFilterFactory and StopFilterFactory have nothing to do in this case.
5. ConcatFilterFactory: "foobarbazqux". It should concatenate the tokens.
6. NGramFilterFactory: This will generate the token.
這裏是ConcatFilter的incrementToken()
方法:
@Override
public boolean incrementToken() throws IOException {
StringBuilder builder = new StringBuilder();
while (input.incrementToken()) {
int len = charTermAtt.length();
char buffer[] = charTermAtt.buffer();
builder.append(buffer, 0, len);
System.out.println("Tokens: " + new String(buffer, 0, len));
clearAttributes();
charTermAtt.setEmpty();
}
System.out.println("Concat tokens: " + builder.toString());
charTermAtt.copyBuffer(builder.toString().toCharArray(), 0, builder.length());
charTermAtt.setLength(builder.length());
posIncAtt.setPositionIncrement(1);
setOffsetAttr.setOffset(0, builder.length());
input.end();
input.close();
return false;
}
這裏我使用while循環來獲取所有的令牌並加入把它們放在一起。有沒有辦法一次獲取所有的令牌沒有循環?
可能重複(http://stackoverflow.com/questions/27560110/solr-custom-filter-for-cancatnating-tokens) – YoungHobbit 2015-08-15 12:31:50