0
我正在使用apache solr 5.1。apache solr搜索的結果太多
有Solr中指數超過13000頁的文件,我索引PDF文檔與Apache咖喱。
對於提高我使用EDIMAX解析器搜索相關性,和它完美的作品,我頂得上預期的效果。
但不是隻有3結果單個詞查詢時,它會返回400多個結果,在頂部和其他結果3個預期的結果是不相關的。
這裏是我使用幾乎所有領域我的字段類型schema.xml中
<fieldType name="text_general" class="solr.TextField" positionIncrementGap="100" multiValued="true" omitNorms="true">
<analyzer type="index">
<charFilter class="solr.HTMLStripCharFilterFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.TrimFilterFactory"/>
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.PorterStemFilterFactory"/>
<filter class="solr.KeywordRepeatFilterFactory"/>
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
<charFilter class="solr.HTMLStripCharFilterFactory"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />
<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />
<filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="0" splitOnCaseChange="1" />
</analyzer>
<analyzer type="query">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.PorterStemFilterFactory"/>
<filter class="solr.KeywordRepeatFilterFactory"/>
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
<charFilter class="solr.HTMLStripCharFilterFactory"/>
<filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="1" />
</analyzer>
</fieldType>
樣品查詢參數。
{
"responseHeader": {
"status": 0,
"QTime": 149,
"params": {
"mm": "100%",
"qs": "10",
"ps": "10",
"indent": "true",
"q.op": "AND",
"lowercaseOperators": "true",
"q": "b4u",
"defType": "edismax",
"qf": "story_title^5.0 tax_payer_name^3.0 judgement_text^1.0 story_description^1.0 nature_of_the_issues decision_summary additional_comments facts_of_the_case section_number case_law_citation",
"pf": "story_title^5.0 tax_payer_name^3.0 judgement_text^1.0 story_description^1.0 nature_of_the_issues decision_summary additional_comments facts_of_the_case section_number case_law_citation",
"wt": "json",
"stopwords": "true",
"_": "1468224236421"
}
},
在此先感謝。
請問您可以分享查詢或您正在搜索的文字? –
我試着搜索詞像「B4U」或「IDS」和有關 – Nilesh
短語查詢我已經發布的示例查詢,你有WordDelimiterFilterFactory作爲過濾器,它會產生很多的文字.. 。您可以在generateWordParts =「1」generateNumberParts =「1」的分析工具中驗證相同的結果,並且您在許多字段中搜索...因此,您將在計算結果中獲得更多計數 – Nilesh