2013-10-09 91 views
2

我使用solrj作爲索引solr服務器上的文檔的客戶端 我是solr的新手,我在solr中突出顯示有問題。使用solr突出顯示精確的短語不起作用。突出顯示solr的確切短語

例如,如果關鍵詞是: 「杜爾塞HOGAR」 返回:

<i> dulce </i> <i> hogar </i> 

它應該是:

<i> dulce hogar </i> 

我不明白這是問題。

我schema.xml中

<fieldType name="text_general" class="solr.TextField" positionIncrementGap="100"> 
    <analyzer type="index"> 
    <tokenizer class="solr.WhitespaceTokenizerFactory"/> 
    <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" /> 
    <!-- in this example, we will only use synonyms at query time 
    <filter class="solr.SynonymFilterFactory" synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/> 
    --> 
    <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0"/> 
    <filter class="solr.LowerCaseFilterFactory"/> 
    <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> 
    </analyzer> 
    <analyzer type="query"> 
    <tokenizer class="solr.WhitespaceTokenizerFactory"/> 
    <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" /> 
    <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/> 
    <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="1"/> 
    <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> 
    <filter class="solr.LowerCaseFilterFactory"/> 
    </analyzer> 
</fieldType> 

而且solrconfig.xml中

<requestHandler name="/select" class="solr.SearchHandler"> 

    <lst name="defaults"> 
    <str name="echoParams">explicit</str> 
    <int name="rows">10</int> 
    <str name="df">text</str> 
    <bool name="hl.usePhraseHighlighter">true</bool> 
</lst> 


</requestHandler> 
<!-- Highlighting Component 

    http://wiki.apache.org/solr/HighlightingParameters 
--> 
<searchComponent class="solr.HighlightComponent" name="highlight"> 
<highlighting> 
    <!-- Configure the standard fragmenter --> 
    <!-- This could most likely be commented out in the "default" case --> 
    <fragmenter name="gap" 
       default="true" 
       class="solr.highlight.GapFragmenter"> 
    <lst name="defaults"> 
     <int name="hl.fragsize">100</int> 
    </lst> 
    </fragmenter> 

    <!-- A regular-expression-based fragmenter 
     (for sentence extraction) 
    --> 
    <fragmenter name="regex" 
       class="solr.highlight.RegexFragmenter" default="true"> 
    <lst name="defaults"> 
     <!-- slightly smaller fragsizes work better because of slop --> 
     <int name="hl.fragsize">70</int> 
     <!-- allow 50% slop on fragment sizes --> 
     <float name="hl.regex.slop">0.5</float> 
     <!-- a basic sentence pattern --> 
     <str name="hl.regex.pattern">[-\w ,/\n\&quot;&apos;]{20,200}</str> 
     <bool name="hl.usePhraseHighlighter">true</bool> 
     <bool name="hl.highlightMultiTerm">true</bool> 
    </lst> 
    </fragmenter> 

    <!-- Configure the standard formatter --> 
    <formatter name="html" 
      default="true" 
      class="solr.highlight.HtmlFormatter"> 
    <lst name="defaults"> 
     <str name="hl.simple.pre"><![CDATA[<em>]]></str> 
     <str name="hl.simple.post"><![CDATA[</em>]]></str> 
    </lst> 
    </formatter> 

感謝提前一些幫助配置,

實。

+0

有人可以幫助我!有任何想法嗎?謝謝! –

回答

0

我是solr的初學者,但是我知道這是爲了讓準確的短語更喜歡使用solr.NGramTokenizerFactory而不是WhitespaceTokenizerFactory作爲索引部分。 或者您可以嘗試請求中的高亮選項hl.mergeContiguous=true(請參閱Highlighter option)。跳它可以幫助你。

1

檢查this後。您需要將hl.q =「dulce hogar」字段與fastVector以及phraseHighLighter一起設置。