標題可能有些模棱兩可,但請耐心等待(我能找到的唯一類似問題是Solr: Search in multiple fields BUT STOP if documents match was found,但是沒有提供任何解決方案)。我有以下結構我Lucene的文件:Lucene:如何在第一場比賽後在當前文檔中停止搜索
FieldA (Store.YES, Index.ANALYZED), primary identification of an entity
FieldB (Store.YES, Index.ANALYZED), secondary identification(s) of an entity
FieldA
例如可以包含類似car
,其中FieldB
可能包含像automobile
,vehicle
等字符串可以有文件在多個領域FieldB
一個字符串。索引分析器是StandardAnalyzer
,搜索分析器是KeywordAnalyzer
(這似乎產生最好的結果,不知道它是否是最好的方法)。 FieldA
中的標識符與FieldB
中的標識符相比具有更高的重要性。
假設該指數包含3個文件(用FieldA | FieldB
字段):
"car" | "vehicle" "automobile"
"car parts" | "parts, car"
"car shop" | "shop, car"
到目前爲止,一切都很好。現在問題的癥結所在:
當查詢"car"
,我想看看下面的結果(分數由):
car, score 1.0
car parts, score 0.9
car shop, score 0.9
與「車」 FieldA
值的文件應該顯示首先,因爲FieldA
被認爲更重要,並且查詢最符合該值。在現實中,發生以下情況:
car parts, score 0.625
car shop, score 0.625
car, score 0.5073969
searcher.explain()
輸出以下內容:(左,因爲它是一樣的「汽車零部件」爲「車店」出來的,在講解)
Explain: 0.625 = (MATCH) max of:
0.31712303 = (MATCH) weight(fielda:car in 0), product of:
0.71231794 = queryWeight(fielda:car), product of:
0.71231794 = idf(docFreq=3, maxDocs=3)
1.0 = queryNorm
0.4451987 = (MATCH) fieldWeight(fielda:car in 0), product of:
1.0 = tf(termFreq(fielda:car)=1)
0.71231794 = idf(docFreq=3, maxDocs=3)
0.625 = fieldNorm(field=fielda, doc=0)
0.625 = (MATCH) fieldWeight(fieldb:car in 0), product of:
1.0 = tf(termFreq(fieldb:car)=1)
1.0 = idf(docFreq=2, maxDocs=3)
0.625 = fieldNorm(field=fieldb, doc=0)
Explain: 0.5073969 = (MATCH) max of:
0.5073969 = (MATCH) weight(fielda:car in 0), product of:
0.71231794 = queryWeight(fielda:car), product of:
0.71231794 = idf(docFreq=3, maxDocs=3)
1.0 = queryNorm
0.71231794 = (MATCH) fieldWeight(fielda:car in 0), product of:
1.0 = tf(termFreq(fielda:car)=1)
0.71231794 = idf(docFreq=3, maxDocs=3)
1.0 = fieldNorm(field=fielda, doc=0)
TL; DR:在這兩個領域,助推FieldA將無濟於事,因爲所有3個文件都會得到提升。如何讓lucene將最接近的匹配(本例中的「car」)排名爲最高?即如何在遇到FieldA
(更重要)匹配後停止在當前文檔中搜索?
這解決了這兩個字段中出現術語的問題,但始終存在潛在的問題。 – NoMoreMrCodeGuy 2012-01-09 09:25:19