在索爾中搜索特殊字符

我在索爾特殊字符搜索時遇到問題。我的文檔有一個字段「標題」，有時它可以像「泰坦尼克號 - 1999」（它有字符「 - 」）。當我嘗試在solr中搜索「 - 」時，我收到400錯誤。我試圖逃避角色，所以我嘗試了「 - 」和「\ - 」之類的東西。隨着更改solr不會迴應我的錯誤，但它返回0結果。在索爾中搜索特殊字符

我怎麼能在Solr管理與特殊字符搜索（類似於「 - 」或「'」 ???

問候

UPDATE 在這裏你可以看到我目前的Solr的方案https://gist.github.com/cpalomaresbazuca/6269375

我搜索到外地「標題」從schema.xml中

摘要：

... 
<!-- A general text field that has reasonable, generic 
    cross-language defaults: it tokenizes with StandardTokenizer, 
    removes stop words from case-insensitive "stopwords.txt" 
    (empty by default), and down cases. At query time only, it 
    also applies synonyms. --> 
    <fieldType name="text_general" class="solr.TextField" positionIncrementGap="100"> 
     <analyzer type="index"> 
      <tokenizer class="solr.StandardTokenizerFactory"/> 
      <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" /> 
      <!-- in this example, we will only use synonyms at query time 
      <filter class="solr.SynonymFilterFactory" synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/> 
      --> 
      <filter class="solr.LowerCaseFilterFactory"/> 

     </analyzer> 
     <analyzer type="query"> 
      <tokenizer class="solr.StandardTokenizerFactory"/> 
      <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" /> 
      <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/> 
      <filter class="solr.LowerCaseFilterFactory"/> 

     </analyzer> 
    </fieldType> 
... 
<field name="Title" type="text_general" indexed="true" stored="true"/>

來源

2013-08-16 shinjidev

當您搜索時，是否將引號括起來？像選擇？q =標題：「泰坦尼克號 - 1999」。把它放在引號中應該做一個確切的搜索 –

你的模式在這個領域看起來像什麼？我很想知道你對這個領域有什麼樣的定義。 –

<字段名=「標題」類型=「text_general」存儲=「真正的」索引=「真」 /> –

要搜索您的確切詞組放在引號圍着它

select?q=title:"Titanic - 1999"

如果你只是想搜索特殊字符，那麼你將需要轉義：

select?q=title:\-

另請檢查： Special characters (-&+, etc) not working in SOLR Query

如果您確切地知道當你搜索，你只要搜索與％2D％，所以 - 這特殊字符，您不想使用，那麼你可以添加這對正則表達式，normalize.xml

<regex> 
    <pattern>&#x2D;</pattern> 
    <substitution>%2D</substitution> 
</regex>

這將替換所有「」 2D而不是「 - 」它會正常工作

來源

2013-08-19 14:23:09

我已經試過：選擇Q =冠軍？：\ - 但它仍然返回0結果:( 我怎麼能知道如果字符「 - 」沒有被收錄？ – shinjidev

盡我建議在下半場，改變正則表達式，normalize.xml。我嘗試過自己和它完美 –

遺憾的問題，但我在哪裏可以找到這個文件???我找不到它 – shinjidev

您正在使用title屬性的標準text_general字段。這可能不是一個好的選擇。 text_general旨在用於大量的文本（或至少句子），而不是用於名稱或標題的精確匹配。

這裏的問題是，text_general使用StandardTokenizerFactory。

<fieldType name="text_general" class="solr.TextField" positionIncrementGap="100"> 
     <analyzer type="index"> 
      <tokenizer class="solr.StandardTokenizerFactory"/> 
      <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" /> 
      <!-- in this example, we will only use synonyms at query time 
      <filter class="solr.SynonymFilterFactory" synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/> 
      --> 
      <filter class="solr.LowerCaseFilterFactory"/> 

     </analyzer> 
     <analyzer type="query"> 
      <tokenizer class="solr.StandardTokenizerFactory"/> 
      <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" /> 
      <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/> 
      <filter class="solr.LowerCaseFilterFactory"/> 

     </analyzer> 
    </fieldType>

StandardTokenizerFactory執行以下操作：

良好的通用標記者剝去許多外來字符，並設置令牌類型有意義的值。令牌類型爲，僅適用於識別類型相同的令牌類型的後續令牌過濾器。

這意味着' - '字符將被完全忽略並用於標記字符串。

「kong-fu」將表示爲「kong」和「fu」。 ' - '消失。

這也解釋了爲什麼select?q=title:\-不能在這裏工作。

選擇更好的擬合字段類型：

取而代之的StandardTokenizerFactory你可以使用solr.WhitespaceTokenizerFactory，只有按空白進行分割單詞的精確匹配。因此，爲title屬性創建自己的字段類型將是一個解決方案。

Solr也有一個叫做text_ws的最小字段類型。根據您的要求，這可能就足夠了。

來源

2015-03-02 18:20:02 jHilscher

我花了很多時間完成這件事。以下是在SolR中查詢特殊字符的一步步驟。希望它能幫助別人。

編輯schema.xml文件並找到您使用的的solr.TextField。

在這兩種，「指數」和查詢」分析儀的修改 WordDelimiterFilterFactory並添加types="characters.txt"喜歡的東西：

<fieldType name="text_general" class="solr.TextField" positionIncrementGap="100" multiValued="true"> 
<analyzer type="index"> 
<tokenizer class="solr.WhitespaceTokenizerFactory"/> 
<filter catenateAll="0" catenateNumbers="0" catenateWords="0" class="solr.WordDelimiterFilterFactory" generateNumberParts="1" generateWordParts="1" splitOnCaseChange="1" types="characters.txt"/> 
</analyzer> 
<analyzer type="query"> 
<tokenizer class="solr.WhitespaceTokenizerFactory"/> 
<filter catenateAll="0" catenateNumbers="0" catenateWords="0" class="solr.WordDelimiterFilterFactory" generateNumberParts="1" generateWordParts="1" splitOnCaseChange="1" types="characters.txt"/> 
</analyzer> 
</fieldType>

確保您使用WhitespaceTokenizerFactory如上圖所示的標記生成器爲

。您的characters.txt文件可能有類似條目 -

\# => ALPHA 
@ => ALPHA 
\u0023 => ALPHA 
       ie:- pointing to ALPHA only.

清除數據，重新編制索引並查詢輸入的字符。它將工作。

來源

2016-07-27 07:51:45 zorze

在索爾中搜索特殊字符

回答

相關問題