我正在多個字段上執行一個字段的query_string查詢,_all
和tags.name
,並試圖理解評分。查詢:{"query":{"query_string":{"query":"animal","fields":["_all","tags.name"]}}}
。下面是查詢返回的文件:爲什麼在同一個查詢中queryWeight包含某些結果分數,但不包含其他分數?
- 文件1對
tags.name
完全匹配,但不是在_all
。 - 文檔8在
tags.name
和_all
上有完全匹配。
文件8應該贏了,它確實如此,但我對打分的結果感到困惑。看起來像文檔1被tags.name
分數乘以兩次IDF而受到處罰,而文檔8的tags.name
分數只乘以一次IDF。總之:
- 他們都有一個組件
weight(tags.name:animal in 0) [PerFieldSimilarity]
。 - 在文檔1中,我們有
weight = score = queryWeight x fieldWeight
。 - 在文件8中,我們有
weight = fieldWeight
!
由於queryWeight
包含idf
,這導致文檔1被idf兩次懲罰。
任何人都可以理解這一點嗎?
信息
- 如果我刪除從查詢的字段
_all
,queryWeight
完全從解釋了。 - 添加
"use_dis_max":true
作爲選項沒有效果。- 然而,另外加入
"tie_breaker":0.7
(或任何值)確實通過給它的更復雜的公式,我們在文獻看到1. - 思想影響文獻8:這是合理的,一個布爾查詢(此是)可能會這樣做是爲了給予與多個子查詢匹配的查詢更多的權重。然而,這對dis_max查詢沒有任何意義,它應該只返回最大的子查詢。
- 然而,另外加入
下面是相關的解釋請求。尋找嵌入式評論。
文獻1(匹配僅在tags.name
):
curl -XGET 'http://localhost:9200/questions/question/1/_explain?pretty' -d '{"query":{"query_string":{"query":"animal","fields":["_all","tags.name"]}}}'
:
{
"ok" : true,
"_index" : "questions_1390104463",
"_type" : "question",
"_id" : "1",
"matched" : true,
"explanation" : {
"value" : 0.058849156,
"description" : "max of:",
"details" : [ {
"value" : 0.058849156,
"description" : "weight(tags.name:animal in 0) [PerFieldSimilarity], result of:",
// weight = score = queryWeight x fieldWeight
"details" : [ {
// score and queryWeight are NOT a part of the other explain!
"value" : 0.058849156,
"description" : "score(doc=0,freq=1.0 = termFreq=1.0\n), product of:",
"details" : [ {
"value" : 0.30685282,
"description" : "queryWeight, product of:",
"details" : [ {
// This idf is NOT a part of the other explain!
"value" : 0.30685282,
"description" : "idf(docFreq=1, maxDocs=1)"
}, {
"value" : 1.0,
"description" : "queryNorm"
} ]
}, {
"value" : 0.19178301,
"description" : "fieldWeight in 0, product of:",
"details" : [ {
"value" : 1.0,
"description" : "tf(freq=1.0), with freq of:",
"details" : [ {
"value" : 1.0,
"description" : "termFreq=1.0"
} ]
}, {
"value" : 0.30685282,
"description" : "idf(docFreq=1, maxDocs=1)"
}, {
"value" : 0.625,
"description" : "fieldNorm(doc=0)"
} ]
} ]
} ]
} ]
}
文獻8(在兩個_all
和tags.name
匹配):
curl -XGET 'http://localhost:9200/questions/question/8/_explain?pretty' -d '{"query":{"query_string":{"query":"animal","fields":["_all","tags.name"]}}}'
:
{
"ok" : true,
"_index" : "questions_1390104463",
"_type" : "question",
"_id" : "8",
"matched" : true,
"explanation" : {
"value" : 0.15342641,
"description" : "max of:",
"details" : [ {
"value" : 0.033902764,
"description" : "btq, product of:",
"details" : [ {
"value" : 0.033902764,
"description" : "weight(_all:anim in 0) [PerFieldSimilarity], result of:",
"details" : [ {
"value" : 0.033902764,
"description" : "fieldWeight in 0, product of:",
"details" : [ {
"value" : 0.70710677,
"description" : "tf(freq=0.5), with freq of:",
"details" : [ {
"value" : 0.5,
"description" : "phraseFreq=0.5"
} ]
}, {
"value" : 0.30685282,
"description" : "idf(docFreq=1, maxDocs=1)"
}, {
"value" : 0.15625,
"description" : "fieldNorm(doc=0)"
} ]
} ]
}, {
"value" : 1.0,
"description" : "allPayload(...)"
} ]
}, {
"value" : 0.15342641,
"description" : "weight(tags.name:animal in 0) [PerFieldSimilarity], result of:",
// weight = fieldWeight
// No score or queryWeight in sight!
"details" : [ {
"value" : 0.15342641,
"description" : "fieldWeight in 0, product of:",
"details" : [ {
"value" : 1.0,
"description" : "tf(freq=1.0), with freq of:",
"details" : [ {
"value" : 1.0,
"description" : "termFreq=1.0"
} ]
}, {
"value" : 0.30685282,
"description" : "idf(docFreq=1, maxDocs=1)"
}, {
"value" : 0.5,
"description" : "fieldNorm(doc=0)"
} ]
} ]
} ]
}
}
嗨,你自己找到答案了嗎?或者你有任何來源去學習?我正在遭受同樣的缺乏理解。在我們的案例中,這會對一些點擊產生不利影響,並且我需要了解爲什麼以及如何調整我們的查詢。 – Jakub
不,我從來沒有找到一個答案,不幸的是,好奇看到你聽到回來。 – tmandry