的Solr - termfreq部分匹配

我使用Solr的查詢一組文檔，我想獲得一定期限匹配的數量，現在我使用的Solr - termfreq部分匹配

termfreq(text,'manage')

然而這並打不上Manager或Management

termfreq(text,'manage*')

返回相同的次數。我嘗試過使用不同的標記器，有些甚至不接受*，我還沒有找到能夠返回正確匹配數量的標記器。

場：

<field name="text" type="text_general" indexed="true" stored="true" termVectors="true" termPositions="true" termOffsets="true" required="false"/>

有沒有一種方法，我可以得到termfreq也算部分匹配？

來源

2015-02-06 A.O.

要小心你的要求......「部分匹配」一般不是一個好主意，比如你想「匹配」匹配「杯子」嗎？「隆起」？「晚餐」？「庫珀蒂諾」？一個典型的解決方案是搜索單詞「stems」（http://en.wikipedia.org/wiki/Stemming），這是你在追求什麼？ – 2015-02-09 20:48:56

您將需要添加一些自定義標記和過濾器類到分析器。

在你/shared/field_types.xml文件，創建一個新的類型是這樣的：

<fieldType name="text" class="solr.TextField" omitNorms="false"> 
    <analyzer> 
     <tokenizer class="solr.StandardTokenizerFactory"/> 
     <filter class="solr.StandardFilterFactory"/> 
     <filter class="solr.LowerCaseFilterFactory"/> 
    </analyzer> 
</fieldType>

而且在/shared/fields.xml：

<field name="text" stored="true" type="text" multiValued="false" indexed="true"/> 
<dynamicField name="*_text" stored="true" type="text" multiValued="false" indexed="true"/>

和使用，作爲「文本「作爲該領域的類型。

一種更先進的解決方案：

<fieldType name="startsWith" class="solr.TextField" positionIncrementGap="100"> 
    <analyzer type="index"> 
     <tokenizer class="solr.KeywordTokenizerFactory"/> 
     <filter class="solr.LowerCaseFilterFactory"/> 
     <!-- remove words/chars we don't care about --> 
     <filter class="solr.PatternReplaceFilterFactory" pattern="[^a-zA-Z0-9 ]" replacement="" replace="all"/> 
     <!-- now remove any extra space we have, since spaces WILL influence matching --> 
     <filter class="solr.PatternReplaceFilterFactory" pattern="\s+" replacement=" " replace="all"/> 
     <filter class="solr.TrimFilterFactory"/> 
     <filter class="solr.ASCIIFoldingFilterFactory"/> 
     <filter class="solr.EdgeNGramFilterFactory" minGramSize="1" maxGramSize="50"/> 
    </analyzer> 
    <analyzer type="query"> 
     <tokenizer class="solr.KeywordTokenizerFactory"/> 
     <filter class="solr.LowerCaseFilterFactory"/> 
     <filter class="solr.PatternReplaceFilterFactory" pattern="[^a-zA-Z0-9 ]" replacement="" replace="all"/> 
     <filter class="solr.PatternReplaceFilterFactory" pattern="\s+" replacement=" " replace="all"/> 
     <filter class="solr.TrimFilterFactory"/> 
     <filter class="solr.ASCIIFoldingFilterFactory"/> 
    </analyzer> 
    </fieldType>

在/shared/fields.xml：

<dynamicField name="*_starts_with" stored="true" type="startsWith" multiValued="false" indexed="true"/>

然後，在你的核心的schema.xml中的頂級補充一點：

<xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="../../../shared/fields.xml"/> 
<xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="../../../shared/field_types.xml"/>

並將其添加到核心的schema.xml中的copyFields中：

<copyFields> 
     <copyField source="yourField" dest="yourField_text"/> 
     <copyField source="yourField" dest="yourField_starts_with"/> 
     ... 
</copyFields>

來源

2015-02-09 18:55:14 pwnyexpress

這兩種方法都會導致termfreq返回零計數，而不是計算完全匹配lol – 2015-02-09 21:29:48

您使用termfreq，yourField或yourField_Text/yourField_starts_with哪個字段？請注意，這些示例假設您並未實際將索引字段命名爲「文本」。 – pwnyexpress 2015-02-09 21:34:05

我從高級解決方案中精確複製了ur代碼。我使用yourField（在我的情況下，該字段被稱爲'textRaw'），現在它返回完全匹配，但它仍然不包括部分匹配。使用yourField_starts_with不會產生任何匹配 – 2015-02-09 21:53:01

我有同樣的問題。我需要計算termfreq，這也應該匹配單詞的子部分。添加這個FieldType解決了它。

<fieldType name="startWith" class="solr.TextField"> 
    <analyzer type="index"> 
    <tokenizer class="solr.StandardTokenizerFactory"/> 
    <filter class="solr.LowerCaseFilterFactory"/> 
    <filter class="solr.EdgeNGramFilterFactory" minGramSize="2" maxGramSize="15" /> 
    </analyzer> 
    <analyzer type="query"> 
    <tokenizer class="solr.StandardTokenizerFactory"/> 
    <filter class="solr.LowerCaseFilterFactory"/> 
    </analyzer> 
</fieldType>

來源

2018-03-01 07:17:15 Gregor

的Solr - termfreq部分匹配

回答

相關問題