2015-04-23 31 views
2

我已經定義同義詞如下: facebook,fb,face book, face bk多長期Solr的代名詞問題

現在,當我搜索Facebook上解析查詢

<str name="parsedquery_toString"> 
    text:facebook text:fb text:face text:face text:book text:bk 
</str> 

但如果我要尋找的臉書,隨後將分析查詢是

<str name="parsedquery_toString"> 
    text:face text:book 
</str> 

解析的查詢不應該是相同的兩個關鍵字?

這是我的配置的片段:

<fieldType name="text_en" class="solr.TextField" positionIncrementGap="100"> 
    <analyzer type="index"> 
    <tokenizer class="solr.StandardTokenizerFactory"/>  
    <filter class="solr.StopFilterFactory" 
      ignoreCase="true" 
      words="lang/stopwords_en.txt" 
      enablePositionIncrements="true" 
      /> 
    <filter class="solr.LowerCaseFilterFactory"/> 
    <filter class="solr.EnglishPossessiveFilterFactory"/> 
    <filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/> 
    <filter class="solr.PorterStemFilterFactory"/> 
    </analyzer> 

    <analyzer type="query"> 
    <tokenizer class="solr.StandardTokenizerFactory"/> 
    <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true" /> 
    <filter class="solr.StopFilterFactory" 
      ignoreCase="true" 
      words="lang/stopwords_en.txt" 
      enablePositionIncrements="true" 
      /> 
    <filter class="solr.LowerCaseFilterFactory"/> 
    <filter class="solr.EnglishPossessiveFilterFactory"/> 
    <filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/> 
    <filter class="solr.PorterStemFilterFactory"/> 
    </analyzer> 
</fieldType> 

請找到synonym.txt

#some test synonym mappings unlikely to appear in real input text 
aaafoo => aaabar 
bbbfoo => bbbfoo bbbbar 
cccfoo => cccbar cccbaz 
fooaaa,baraaa,bazaaa 

# Some synonym groups specific to this example 
GB,gib,gigabyte,gigabytes 
MB,mib,megabyte,megabytes 
facebook,fb,face book, face bk 
Television, Televisions, TV, TVs 
#notice we use "gib" instead of "GiB" so any WordDelimiterFilter coming 
#after us won't split it into two words. 

# Synonym mappings can be used for spelling correction too 
pixima => pixma 
+0

顯示代名詞文件 – Mysterion

+0

@Mysterion我已經更新了問題 – Jeyaprakash

+0

我討厭重複的信息,但這個答案列出了許多不同的解決方案中的問題 http://stackoverflow.com/a/41837371/8123 –

回答

1

這是Solr中/ Lucene的一個衆所周知的問題,你可以找到更多關於的內容它在:

如果要解決這個問題相匹配,你有幾種選擇:

  1. 申請幾個plugings的一個/解析器在上述兩個資源中提到。作爲缺點,您將不得不在每次升級solr時重做該工作。
  2. 將同義詞移動到索引時間。無論如何,這是首選,但它有其自身的缺點。