我有電子商務索引數據與字段名稱PartNumber(項目編號),並在每天刷新數據。Solr返回頂級排名不完全符合條款101它返回S101
在Solr的字段類型是文本因爲字段可能包含數字,字符或特殊字符像衝刺。
當我與像術語搜索:
192.168.XX:?XX/solr的/關鍵字/選擇Q = (部分號碼:101)^ 2.0 + OR +(101) &開始= 0 &行= 20 &拼寫檢查=真&版= 2.2 &調試=真& FL = *,得分
查詢結果回報:第一排20:
- S101
- 101S
- 101U
我已經試過部分號碼: 「101」 和101相同的結果總是返回其中101沒有排名第一。
注:如果術語是4個或更多字符(5000,16400,K5125,...等)的頂部結果是更好的,並且通常完全匹配是第一個。
某些調試結果:
<lst name="debug">
<str name="rawquerystring">(PartNumber:101)^2.0 OR (101)</str><str name="querystring">(PartNumber:101)^2.0 OR (101)</str>
<str name="parsedquery">PhraseQuery(PartNumber:"1 10 101"^2.0) PhraseQuery(text:"1 10 101")</str>
<str name="parsedquery_toString">PartNumber:"1 10 101"^2.0 text:"1 10 101"</str><lst name="explain">
<str name="40541432">
6.7604995 = (MATCH) sum of:
5.1748066 = (MATCH) weight(PartNumber:"1 10 101"^2.0 in 492450) [DefaultSimilarity], result of:
5.1748066 = score(doc=492450,freq=1.0 = phraseFreq=1.0
), product of:
0.91124594 = queryWeight, product of:
2.0 = boost
11.357651 = idf(), sum of:
1.5469646 = idf(docFreq=797168, maxDocs=1377508)
3.6602204 = idf(docFreq=96332, maxDocs=1377508)
6.1504664 = idf(docFreq=7984, maxDocs=1377508)
0.040115952 = queryNorm
5.6788254 = fieldWeight in 492450, product of:
1.0 = tf(freq=1.0), with freq of:
1.0 = phraseFreq=1.0
11.357651 = idf(), sum of:
1.5469646 = idf(docFreq=797168, maxDocs=1377508)
3.6602204 = idf(docFreq=96332, maxDocs=1377508)
6.1504664 = idf(docFreq=7984, maxDocs=1377508)
0.5 = fieldNorm(doc=492450)
1.5856929 = (MATCH) weight(text:"1 10 101" in 492450) [DefaultSimilarity], result of:
1.5856929 = score(doc=492450,freq=4.0 = phraseFreq=4.0
), product of:
0.4118627 = queryWeight, product of:
10.266806 = idf(), sum of:
1.407141 = idf(docFreq=916800, maxDocs=1377508)
3.1487658 = idf(docFreq=160655, maxDocs=1377508)
5.7108994 = idf(docFreq=12392, maxDocs=1377508)
0.040115952 = queryNorm
3.850052 = fieldWeight in 492450, product of:
2.0 = tf(freq=4.0), with freq of:
4.0 = phraseFreq=4.0
10.266806 = idf(), sum of:
1.407141 = idf(docFreq=916800, maxDocs=1377508)
3.1487658 = idf(docFreq=160655, maxDocs=1377508)
5.7108994 = idf(docFreq=12392, maxDocs=1377508)
0.1875 = fieldNorm(doc=492450)
</str>
什麼是PartNumber的字段類型和定義,它看起來像問題不在您的查詢。它看起來像在分析和查詢時間過程中分析字段的方式(是否使用ngram等?)。 – Arun
字段類型是與文件相同的默認文件: –
OldTrain
斷詞: **在索引:** StopFilterFactory,WordDelimiterFilterFactory,LowerCaseFilterFactory,KeywordMarkerFilterFactory,PorterStemFilterFactory,RemoveDuplicatesTokenFilterFactory **在查詢:** WhitespaceTokenizerFactory,SynonymFilterFactory,StopFilterFactory,WordDelimiterFilterFactory,LowerCaseFilterFactory, KeywordMarkerFilterFactory,PorterStemFilterFactory,RemoveDuplicatesTokenFilterFactory – OldTrain