0
我正在嘗試改進我的項目上的拼寫檢查行爲。Solr:在高級solr模式下使用拼寫檢查程序
我只在一個字段(「標題」)上拼寫檢查。底部的相關配置。
原因如果配置 - 支持特殊字符的對象。
例如:
- 憤怒的戰爭:T70,世界上最好的坦克!
- 現代芯片組M74K34#11 $$ 1 - A:B:C - 100500bestprices!非常爽的對象|標題
覆蓋大多數情況下,我將配置 「solr.WordDelimiterFilterFactory」 過濾器。
問題:在拼寫檢查結果:當我嘗試搜索「安格鳥」時,我收到「憤怒的小鳥」,而不是預期的「憤怒的小鳥」。可能的話,僅僅通過spec chars(我的意思是拆分「憤怒的戰爭:T70,」=>「生氣」,「戰爭」,「T70」,「T」,「70」)修飾關鍵字就足夠了。但是,我如何通過spec chars修飾關鍵字?或者任何人有更好的想法?
<field name="title" type="text_en" indexed="true" stored="true" required="true" multiValued="false"/>
其中 「text_en」 是:
<fieldType name="text_en" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
<filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="0" catenateWords="1" catenateNumbers="0" catenateAll="0" splitOnCaseChange="0" splitOnNumerics="0" preserveOriginal="1"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.EnglishPossessiveFilterFactory"/>
<filter class="solr.KStemFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="0" catenateWords="0" catenateNumbers="0" catenateAll="0" splitOnCaseChange="0" splitOnNumerics="0" preserveOriginal="1"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.EnglishPossessiveFilterFactory"/>
<filter class="solr.KStemFilterFactory"/>
</analyzer>
</fieldType>
在SolrConfig我用
<requestHandler name="/select" class="solr.SearchHandler">
<lst name="defaults">
<str name="echoParams">explicit</str>
<int name="rows">10</int>
<str name="spellcheck.count">3</str>
<str name="spellcheck.dictionary">default</str>
<str name="spellcheck.dictionary">wordbreak</str> <!-- index -->
<str name="spellcheck.maxCollationTries">10</str>
<str name="spellcheck.maxCollations">5</str>
</lst>
<arr name="last-components">
<str>spellcheck</str>
</arr>
</requestHandler>
<searchComponent name="spellcheck" class="solr.SpellCheckComponent">
<str name="queryAnalyzerFieldType">textSpell</str>
<lst name="spellchecker">
<str name="name">default</str>
<str name="field">title</str>
<!-- <str name="field">default_search_field</str> -->
<str name="classname">solr.DirectSolrSpellChecker</str>
<!-- the spellcheck distance measure used, the default is the internal levenshtein -->
<!-- <str name="distanceMeasure">internal</str> -->
<str name="distanceMeasure">org.apache.lucene.search.spell.JaroWinklerDistance</str>
<!-- minimum accuracy needed to be considered a valid spellcheck suggestion -->
<float name="accuracy">0.7</float>
<!-- the maximum #edits we consider when enumerating terms: can be 1 or 2 -->
<int name="maxEdits">2</int>
<!-- the minimum shared prefix when enumerating terms -->
<int name="minPrefix">1</int>
<!-- maximum number of inspections per result. -->
<int name="maxInspections">5</int>
<!-- minimum length of a query term to be considered for correction -->
<int name="minQueryLength">4</int>
<!-- maximum threshold of documents a query term can appear to be considered for correction -->
<float name="maxQueryFrequency">0.01</float>
<!-- uncomment this to require suggestions to occur in 1% of the documents
<float name="thresholdTokenFrequency">.01</float>
-->
<str name="buildOnCommit">false</str>
<str name="buildOnOptimize">true</str>
<str name="combineWords">true</str>
<str name="breakWords">true</str>
<str name="comparatorClass">freq</str>
<str name="collate">true</str>
<str name="count">5</str>
</lst>
<lst name="spellchecker">
<str name="name">wordbreak</str>
<!-- <str name="classname">solr.DirectSolrSpellChecker</str> -->
<str name="classname">solr.WordBreakSolrSpellChecker</str>
<str name="field">title</str>
<str name="combineWords">true</str>
<str name="breakWords">true</str>
<int name="maxChanges">5</int>
</lst>
</searchComponent>
的人物,但這刪除(分割)所有':'字符,但我這個我需要刪除只是拖尾字符。可能嗎 ? //或者用正則表達式? – iMysak
爲拼寫檢查製作一個不同的字段。爲該字段中的拼寫檢查指定清除術語並將其用於拼寫檢查。用text_en字段搜索。是的,與正則表達式,你也可以刪除尾隨字符。 –