Solr不突出顯示某些詞

我配置了solr 4.10（也5.3）和highlighting functionality。它可以很好地處理大多數單詞，但是我找到了一些「不會」字樣的字樣，即solr會返回所需的文檔，但不會突出顯示其中的一些。Solr不突出顯示某些詞

什麼會導致這種影響？

solrconfig.xml中

<requestHandler name="/select" class="solr.SearchHandler"> 
<lst name="defaults"> 
    <str name="wt">json</str> 
    <str name="indent">true</str> 
    <str name="defType">edismax</str> 
    <str name="bf">product(concount)</str> 
    <str name="df">text bio text_syn text_syn_other</str> 
    <str name="qf"> 
    text^25 bio^16 text_syn^8 text_syn_other^3 
    </str> 
    <str name="hl">on</str> 
    <str name="hl.fl">text bio text_syn text_syn_other</str> 
    <str name="hl.preserveMulti">true</str> 
    <str name="hl.encoder">html</str> 
    <str name="f.text.hl.fragsize">100</str> 
    <str name="hl.snippets">20</str> 
    <arr name="components"> 
    <str>highlight</str> 
    </arr> 
</lst>

schema.xml中

<fieldType name="text_en" class="solr.TextField" positionIncrementGap="100"> 
    <analyzer type="index"> 
    <tokenizer class="solr.PatternTokenizerFactory" pattern="[\s\n,/\\]" /> 
    <filter class="solr.LowerCaseFilterFactory"/> 
    <filter class="solr.EnglishPossessiveFilterFactory"/> 
    <filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/> 
    <filter class="solr.PorterStemFilterFactory"/> 
    <filter class="solr.SynonymFilterFactory" synonyms="synonyms_abbr.txt" ignoreCase="true" expand="false"/> 
    </analyzer> 
    <analyzer type="query"> 
    <tokenizer class="solr.PatternTokenizerFactory" pattern="[\s\n,/\\]" /> 
    <filter class="solr.LowerCaseFilterFactory"/> 
    <filter class="solr.EnglishPossessiveFilterFactory"/> 
    <filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/> 
    <filter class="solr.PorterStemFilterFactory"/> 
    </analyzer> 
</fieldType> 

<fieldType name="text_en_syn" class="solr.TextField" positionIncrementGap="100"> 
    <analyzer type="index"> 
    <tokenizer class="solr.PatternTokenizerFactory" pattern="[\s\n,/\\]" /> 
    <filter class="solr.LowerCaseFilterFactory"/> 
    <filter class="solr.EnglishPossessiveFilterFactory"/> 
    <filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/> 
    <filter class="solr.PorterStemFilterFactory"/> 
    <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="false"/> 
    </analyzer> 
    <analyzer type="query"> 
    <tokenizer class="solr.PatternTokenizerFactory" pattern="[\s\n,/\\]" /> 
    <filter class="solr.LowerCaseFilterFactory"/> 
    <filter class="solr.EnglishPossessiveFilterFactory"/> 
    <filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/> 
    <filter class="solr.PorterStemFilterFactory"/> 
    </analyzer> 
</fieldType> 

<fieldType name="text_en_syn_other" class="solr.TextField" positionIncrementGap="100"> 
    <analyzer type="index"> 
    <tokenizer class="solr.PatternTokenizerFactory" pattern="[\s\n,/\\]" /> 
    <filter class="solr.LowerCaseFilterFactory"/> 
    <filter class="solr.EnglishPossessiveFilterFactory"/> 
    <filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/> 
    <filter class="solr.PorterStemFilterFactory"/> 
    <filter class="solr.SynonymFilterFactory" synonyms="synonyms_other.txt" ignoreCase="true" expand="false"/> 
    </analyzer> 
    <analyzer type="query"> 
    <tokenizer class="solr.PatternTokenizerFactory" pattern="[\s\n,/\\]" /> 
    <filter class="solr.LowerCaseFilterFactory"/> 
    <filter class="solr.EnglishPossessiveFilterFactory"/> 
    <filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/> 
    <filter class="solr.PorterStemFilterFactory"/> 
    </analyzer> 
</fieldType> 

<field name="text" type="text_en" indexed="true" stored="true" multiValued="false" /> 
<field name="text_syn" type="text_en_syn" indexed="true" stored="false" multiValued="true" /> 
<field name="text_syn_other" type="text_en_syn_other" indexed="true" stored="false" multiValued="true" /> 

<field name="text_exact" type="string" indexed="true" stored="false" multiValued="false" /> 

<field name="bio" type="text_en" indexed="true" stored="true" multiValued="false" /> 

<field name="bio_exact" type="string" indexed="true" stored="false" multiValued="false" /> 

<field name="concount" type="long" indexed="true" stored="true" multiValued="false" /> 

<field name="concount_exact" type="long" indexed="true" stored="false" multiValued="false" /> 

<copyField source="text" dest="text_syn"/> 
<copyField source="bio" dest="text_syn"/> 
<copyField source="text" dest="text_syn_other"/> 
<copyField source="bio" dest="text_syn_other"/>

對於查詢http://localhost:8983/solr/select?q=senior我得到了包含單詞senior文檔，但在強調Solr的響應一節詞不突出顯示。

更新1： 我發現，我有話senior在我synonyms_abbr.txt文件，該行senior,lead。當我評論那條線或者替換單詞的地方時，令人驚訝的是senior這個單詞開始突出顯示。有任何想法嗎？

更新2：從synonyms.txt和synonyms_other.txt 詞越來越突出正常，但從synonyms_abbr.txt詞語運行異常如下。舉例來說，我也行lead,head,senior在synonyms_abbr.txt然後

查詢http://localhost:8983/solr/select?q=senior和http://localhost:8983/solr/select?q=head不突出任何文字，
查詢http://localhost:8983/solr/select?q=lead亮點不僅字lead，也head和senior。

來源

2015-10-20 Mher

請使用Solr後端功能來分析單詞的轉換。我不確定這個詞的轉換方式。這可能是一個棘手的問題。否則，使用不同的字段，關閉轉換隻留下標記器，然後嘗試從該字段突出顯示。 – 0xCAFEBABE

@Mher是否突出顯示停用詞？或者只是隨機？ –

我沒有任何停用詞配置。整個'stopwords.txt'文件被評論。 – Mher

從你UPDATE2很顯然，只在lead,head,senior的第一個詞是實際用於同義詞匹配和突出顯示。

如果你看一下在SolrWiki https://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters文檔存在具有一定的效果

的同義詞參數名外部文件中定義的同義詞的expand=true提。如果ignoreCase爲true，則在檢查相等之前匹配將小寫。 如果擴展爲真，則同義詞將擴展爲所有等同的同義詞。如果它是錯誤的，所有等同的同義詞將被縮減到列表中的第一個。

該網站還介紹和例子

# If expand==true, "ipod, i-pod, i pod" is equivalent to the explicit mapping: 
ipod, i-pod, i pod => ipod, i-pod, i pod 
# If expand==false, "ipod, i-pod, i pod" is equivalent to the explicit mapping: 
ipod, i-pod, i pod => ipod

這似乎是與你所觀察的行爲是一致的。這意味着您應該更改schema.xml中的同義詞過濾器定義以使用expand = true或更改同義詞文件定義過濾器以使用顯式映射的方式。

此外，由於分析儀在索引編制時工作，您可能必須重新索引文檔才能使其工作。

來源

2015-10-30 07:27:30 vvs

感謝您的解釋，這是非常有用的，你能解釋一個問題的其他部分：例如，我們有'expand = true'，爲什麼當我從同義詞word_abbr.txt文件中查詢時，該詞的所有同義詞都在突出顯示以及本身，但是wehen從'synonyms .txt'中選擇同義詞，然後突出顯示只獲取該詞，而不是該詞的同義詞？ – Mher

看看你的字段定義，看起來鏈接到synonym.txt的字段是用store = false配置的。因此突出顯示不起作用。請參閱ilinca對此問題的其他回覆 – vvs

某些字段沒有存儲，因此無法返回。由於它們被編入索引，因此可以搜索。將您的模式更改爲您想要突出顯示的所有字段的stored =「true」。

<field name="text_syn" type="text_en_syn" indexed="true" stored="true" multiValued="true" /> 
<field name="text_syn_other" type="text_en_syn_other" indexed="true" stored="true" multiValued="true" />

通過查看您的配置，我認爲突出顯示在字段生物和文本的作品？

來源

2015-10-23 13:58:10 ilinca

沒有@linca，突出顯示不適用於字段生物和文本。我不考慮沒有存儲的文件。 – Mher

哦，對不起。行文本生物text_syn text_syn_other讓我覺得你想在2未存儲的領域的亮點。我想我們需要一個字段的值，查詢，結果的例子。 – ilinca

你能嘗試高級，鉛和鉛，資深添加到文件synonyms_abbr.txt然後再嘗試運行熒光筆

來源

2015-10-29 11:43:07 user155806

Solr不突出顯示某些詞

回答

相關問題