2012-02-13 103 views
0

我Solr模式如下(僅重要部分):使用dismax搜索多字索引項

<fieldType name="bagofwords_expertfinding" class="solr.TextField" positionIncrementGap="100"> 
    <analyzer type="index"> 
    <!-- remove letters repeated more than two times --> 
    <charFilter class="solr.HTMLStripCharFilterFactory"/> 
    <tokenizer class="solr.StandardTokenizerFactory"/> 
    <filter class="solr.StopFilterFactory" 
      ignoreCase="true" 
      words="stopwords_en.txt" 
      enablePositionIncrements="true" 
      /> 
    <filter class="solr.LowerCaseFilterFactory"/> 
    <filter class="solr.EnglishPossessiveFilterFactory"/> 
    <filter class="solr.PatternReplaceFilterFactory" pattern="^[0-9-/_,\.]+$" replacement="" replace="all"/> 
    <filter class="solr.PatternReplaceFilterFactory" pattern="^.*(([aA-zZ])\\2)\\2+.*$" replacement=""/> 
    <filter class="solr.PorterStemFilterFactory"/> 
    <filter class="solr.LengthFilterFactory" min="3" max="100"/> 
    </analyzer> 
    <analyzer type="query"> 
    <tokenizer class="solr.StandardTokenizerFactory"/> 
    <filter class="solr.StopFilterFactory" 
      ignoreCase="true" 
      words="stopwords_en.txt" 
      enablePositionIncrements="true" 
      /> 
    <filter class="solr.LowerCaseFilterFactory"/> 
    <filter class="solr.EnglishPossessiveFilterFactory"/> 
    <filter class="solr.PatternReplaceFilterFactory" pattern="^[0-9-/_,\.]+$" replacement="" replace="all"/> 
    <filter class="solr.PorterStemFilterFactory"/> 
    <filter class="solr.LengthFilterFactory" min="3" max="100"/> 
    </analyzer> 
</fieldType> 
<fieldType name="namedentities_expertfinding" class="solr.TextField" positionIncrementGap="100"> 
    <analyzer type="index"> 
    <!-- remove letters repeated more than two times --> 
    <charFilter class="solr.PatternReplaceCharFilterFactory" pattern="\s," replacement=","/> 
    <charFilter class="solr.PatternReplaceCharFilterFactory" pattern=",\s" replacement=","/> 
    <tokenizer class="solr.PatternTokenizerFactory" pattern="," /> 
    <filter class="solr.LowerCaseFilterFactory"/> 
    </analyzer> 
    <analyzer type="query"> 
    <tokenizer class="solr.WhitespaceTokenizerFactory"/> 
    <filter class="solr.StopFilterFactory" 
      ignoreCase="true" 
      words="stopwords_en.txt" 
      enablePositionIncrements="true" 
      /> 
    <filter class="solr.LowerCaseFilterFactory"/> 
    <filter class="solr.EnglishPossessiveFilterFactory"/> 
    <filter class="solr.PatternReplaceFilterFactory" pattern="^[0-9-/_,\.]+$" replacement="" replace="all"/> 
    <filter class="solr.LengthFilterFactory" min="3" max="100"/> 
    </analyzer> 
</fieldType> 

在namedentities我索引多字詞,如:「diego alberto milito」,「diego armando maradona」。我試圖在兩個領域進行搜索,以dismax查詢來提升他們。

但與此查詢嘗試: 本地主機:8080/Solr的/選擇/ Q = 「馬拉多納」 & DEFTYPE = dismax & QF = namedentities^100個bagofwords^1 & FL = *,得分& debugQuery =真& mm = 0

solr找不到任何東西。也許我不明白正確使用「象徵

我不明白,也給這個從Solr的維基:

」在Solr的1.4和之前,您應該基本定毫米= 0,如果你想等同於q.op = OR,而mm = 100%,如果您想要q.op = AND的等價性。在3.x和trunk中,默認值mm由q.op參數決定(q.op = AND => mm = 100%; q.op = OR => mm = 0%)。請記住,缺省操作符受到schema.xml條目的影響。在較舊版本的Solr中,默認值爲100%(所有子句必須匹配)「

並且假設在我的模式中defaultOperator是OR,爲什麼沒有設置mm = 0,我獲得的默認mm值爲100.

提前感謝!

+0

解析查詢的調試版本的輸出也是有用的。我懷疑t由於您將字段標記爲字母,因此您的精確搜索將不匹配 - 因爲這兩個條目都不是您將其用引號引起來搜索的字符串。 – MatsLindh 2012-02-13 21:46:17

+0

謝謝。我終於發現引號並不意味着完全匹配,而是尋找一個短語:連續的字符串,所以我改變了我的模式分析器。但是沒有辦法處理多詞記號......所以我在單詞索引中搜索短語 – Tywnil 2012-02-13 21:56:15

回答

0

有各地的查詢字符串引號上述迫使短語查詢,這意味着只有完全匹配的考慮。刪除它們,用括號替換和實驗與PF和PF2和PF3參數以增加更長的匹配短語