我有一個查詢應該返回具有類似興趣的配置文件。問題是與更多匹配條件得分較低的文件。Elasticseach - 匹配更多條件的文檔的分數低於匹配的分數
在bool
查詢,我有should
與interests = ['games', 'music', 'sport']
與interests = ['games']
文件獲得得分0.14981213
文檔與interests = ['games', 'music']
得到得分0.11516824。
爲什麼?我使用AWS elasticsearch,v。2.3.2。
查詢看起來像:
{
"explain": true,
"from": 0,
"query": {
"bool": {
"filter": [
{
"bool": {
"must_not": [
{
"term": {
"id": 3918
}
}
]
}
}
],
"should": [
{
"terms": {
"interests": [
"games",
"music",
"sport"
]
}
}
]
}
},
"size": 10
}
然後,結果我得到:
{
"_shards": {
"failed": 0,
"successful": 5,
"total": 5
},
"hits": {
"hits": [
{
"_explanation": {
"description": "sum of:",
"details": [
{
"description": "match on required clause, product of:",
"details": [
{
"description": "# clause",
"details": [],
"value": 0.0
},
{
"description": "-id:`\b\u0000\u0000\u001eN #*:*, product of:",
"details": [
{
"description": "boost",
"details": [],
"value": 1.0
},
{
"description": "queryNorm",
"details": [],
"value": 0.4494364
}
],
"value": 0.4494364
}
],
"value": 0.0
},
{
"description": "product of:",
"details": [
{
"description": "sum of:",
"details": [
{
"description": "weight(interests:games in 1) [PerFieldSimilarity], result of:",
"details": [
{
"description": "score(doc=1,freq=1.0), product of:",
"details": [
{
"description": "queryWeight, product of:",
"details": [
{
"description": "idf(docFreq=2, maxDocs=3)",
"details": [],
"value": 1.0
},
{
"description": "queryNorm",
"details": [],
"value": 0.4494364
}
],
"value": 0.4494364
},
{
"description": "fieldWeight in 1, product of:",
"details": [
{
"description": "tf(freq=1.0), with freq of:",
"details": [
{
"description": "termFreq=1.0",
"details": [],
"value": 1.0
}
],
"value": 1.0
},
{
"description": "idf(docFreq=2, maxDocs=3)",
"details": [],
"value": 1.0
},
{
"description": "fieldNorm(doc=1)",
"details": [],
"value": 1.0
}
],
"value": 1.0
}
],
"value": 0.4494364
}
],
"value": 0.4494364
}
],
"value": 0.4494364
},
{
"description": "coord(1/3)",
"details": [],
"value": 0.33333334
}
],
"value": 0.14981213
}
],
"value": 0.14981213
},
"_id": "3917",
"_index": "test_44024988_profiles",
"_node": "urWXg5KhREyffYielaa6Rw",
"_score": 0.14981213,
"_shard": 2,
"_source": {
"full_name": "Bob Doe",
"id": 3916,
"interests": [
"games"
],
"user_id": 3917
},
"_type": "profile_document"
},
{
"_explanation": {
"description": "sum of:",
"details": [
{
"description": "match on required clause, product of:",
"details": [
{
"description": "# clause",
"details": [],
"value": 0.0
},
{
"description": "-id:`\b\u0000\u0000\u001eN #*:*, product of:",
"details": [
{
"description": "boost",
"details": [],
"value": 1.0
},
{
"description": "queryNorm",
"details": [],
"value": 0.9173473
}
],
"value": 0.9173473
}
],
"value": 0.0
},
{
"description": "product of:",
"details": [
{
"description": "sum of:",
"details": [
{
"description": "weight(interests:games in 0) [PerFieldSimilarity], result of:",
"details": [
{
"description": "score(doc=0,freq=1.0), product of:",
"details": [
{
"description": "queryWeight, product of:",
"details": [
{
"description": "idf(docFreq=1, maxDocs=1)",
"details": [],
"value": 0.30685282
},
{
"description": "queryNorm",
"details": [],
"value": 0.9173473
}
],
"value": 0.2814906
},
{
"description": "fieldWeight in 0, product of:",
"details": [
{
"description": "tf(freq=1.0), with freq of:",
"details": [
{
"description": "termFreq=1.0",
"details": [],
"value": 1.0
}
],
"value": 1.0
},
{
"description": "idf(docFreq=1, maxDocs=1)",
"details": [],
"value": 0.30685282
},
{
"description": "fieldNorm(doc=0)",
"details": [],
"value": 1.0
}
],
"value": 0.30685282
}
],
"value": 0.08637618
}
],
"value": 0.08637618
},
{
"description": "weight(interests:music in 0) [PerFieldSimilarity], result of:",
"details": [
{
"description": "score(doc=0,freq=1.0), product of:",
"details": [
{
"description": "queryWeight, product of:",
"details": [
{
"description": "idf(docFreq=1, maxDocs=1)",
"details": [],
"value": 0.30685282
},
{
"description": "queryNorm",
"details": [],
"value": 0.9173473
}
],
"value": 0.2814906
},
{
"description": "fieldWeight in 0, product of:",
"details": [
{
"description": "tf(freq=1.0), with freq of:",
"details": [
{
"description": "termFreq=1.0",
"details": [],
"value": 1.0
}
],
"value": 1.0
},
{
"description": "idf(docFreq=1, maxDocs=1)",
"details": [],
"value": 0.30685282
},
{
"description": "fieldNorm(doc=0)",
"details": [],
"value": 1.0
}
],
"value": 0.30685282
}
],
"value": 0.08637618
}
],
"value": 0.08637618
}
],
"value": 0.17275237
},
{
"description": "coord(2/3)",
"details": [],
"value": 0.6666667
}
],
"value": 0.11516824
}
],
"value": 0.11516824
},
"_id": "3918",
"_index": "test_44024988_profiles",
"_node": "urWXg5KhREyffYielaa6Rw",
"_score": 0.11516824,
"_shard": 4,
"_source": {
"full_name": "Alex Test",
"id": 3917,
"interests": [
"games",
"music"
],
"user_id": 3918
},
"_type": "profile_document"
},
... # not interesting doc
],
"max_score": 0.14981213,
"total": 3
},
"timed_out": false,
"took": 3
}
我的輸入數據:
[{
"full_name": "Bob Doe",
"id": 3916,
"interests": [
"games"
],
"user_id": 3917
}, {
"full_name": "Alex Test",
"id": 3917,
"interests": [
"games",
"music"
],
"user_id": 3918
}, {
"full_name": "Joe Test",
"id": 3918,
"user_id": 3919
}]
嘿!感謝您的答覆。我理解這個公式,但現在有個問題 - 公式是錯誤的還是我的期望?我認爲'過濾器'不應該影響分數,'應該'作爲查詢應該工作很直接。 – marxin
是的,你是正確的,過濾器不影響評分,這正是你的情況發生了什麼,你只是從條款查詢得到分數。事情是,我們可以親自計算tf idf,看看公式是否完全一樣,並相信我會的。因爲它考慮到術語 – Mysterion
這個詞的稀有性,所以我不會說這個分數與公式給出的分數是不同的。讓我們同意,考慮到公式,它的工作是正確的,但我只是想知道它是否正常工作,考慮到普通用戶的期望。但也許那只是我。 其他的事情是,它似乎並不穩定。圍繞這個問題的更多背景是,這是我在單元測試中在CI服務器上得到的結果,在本地計算機上得分「正確」(遵循我的預期)。即使使用相同的elasticsearch運行,只是不同的索引名稱。 – marxin