我在我的數據庫中的這些價值觀爲title
領域:Solr的:計算方面的數字時忽略字符串的外殼
"I Am A String"
"I am A string"
我希望可在我的搜索結果方面的標題字段。
當前的結果:
<lst name="title">
<int name="I Am A String">4</int>
<int name="I am A string">3</int>
</lst>
期望的結果:
<lst name="title">
<int name="I Am A String">7</int>
</lst>
我其實不關心其中2個可用的選項字符串被選擇爲最終結果,只要相同字符串(不區分大小寫)針對同一方面進行計數。
我嘗試了title
字段的以下字段定義。我還添加了由此產生的方面邏輯。
串=看到套管作爲不同的字符串
string_exact =看到套管作爲不同的字符串
text_ws =分解成單詞與外殼完好
文本=斷裂成單獨的詞
textTight =斷裂成單獨的詞
textTrue =在口頭上打破了與外殼完好
string_exacttest =在口頭上打破了與外殼完好
這裏是我的schema.xml
<field name="title" type="string" indexed="true" stored="true"/>
<fieldType name="string" class="solr.StrField" sortMissingLast="true" omitNorms="true" />
<fieldType name="string_exact" class="solr.TextField"
sortMissingLast="true" omitNorms="true">
<analyzer>
<tokenizer class="solr.KeywordTokenizerFactory"/>
</analyzer>
</fieldType>
<fieldType name="text_ws" class="solr.TextField" positionIncrementGap="100">
<analyzer>
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
</analyzer>
</fieldType>
<!-- A text field that uses WordDelimiterFilter to enable splitting and matching of words on case-change, alpha numeric boundaries, and non-alphanumeric chars, so that a query of "wifi" or "wi fi" could match a document containing "Wi-Fi".
Synonyms and stopwords are customized by external files, and stemming is enabled. Duplicate tokens at the same position (which may result from Stemmed Synonyms or WordDelim parts) are removed.-->
<fieldType name="text" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords_dutch.txt"/>
<filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords_dutch.txt"/>
<filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
<filter class="solr.LowerCaseFilterFactory"/>
<!--<filter class="solr.EnglishPorterFilterFactory" protected="protwords.txt"/>-->
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
</analyzer>
</fieldType>
<!-- Less flexible matching, but less false matches. Probably not ideal for product names,but may be good for SKUs. Can insert dashes in the wrong place and still match. -->
<fieldType name="textTight" class="solr.TextField" positionIncrementGap="100" >
<analyzer>
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="false"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords_dutch.txt" />
<filter class="solr.WordDelimiterFilterFactory" generateWordParts="0" generateNumberParts="0" catenateWords="1" catenateNumbers="1" catenateAll="0"/>
<filter class="solr.ASCIIFoldingFilterFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.SnowballPorterFilterFactory" language="Dutch" protected="protwords.txt"/>
<!--
this filter can remove any duplicate tokens that appear at the same position - sometimes possible with WordDelimiterFilter in conjuncton with
stemming.
-->
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
</analyzer>
</fieldType>
<fieldType name="textTrue" class="solr.TextField" positionIncrementGap="100" >
<analyzer>
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="false"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords_dutch.txt" />
<filter class="solr.WordDelimiterFilterFactory" generateWordParts="0" generateNumberParts="0" catenateWords="1" catenateNumbers="1" catenateAll="0"/>
<filter class="solr.ASCIIFoldingFilterFactory"/>
<filter class="solr.SnowballPorterFilterFactory" language="Dutch" protected="protwords.txt"/>
</analyzer>
</fieldType>
如何確保在計算facet時將相同的字符串(忽略大小寫)分組在一起?
嗨,謝謝。但我不希望它被小型化。我想保護套管。請參閱帖子本身,因爲我知道這些字符串是以不同的方式放置的,所以我想選擇任何一種不同的套用變體並使用它。我會怎麼做呢? – Flo
但是你實際上說你想要那個 - 你希望計數不受套管的影響,這意味着你必須用相同的套管索引這些標記。如果你不這樣做,它們將是不同的標記,因此,計數方式不同。我會根據字段的規則(例如正常句子大小寫(我是一個..)),在爲方面字段建立索引時規範化外殼,或者我認爲您必須使用第一個版本並迭代結果並手動合併每個條目(..或獲取兩個方面,並從第二個方面查找大寫版本,第一個更有效) – MatsLindh