0
我得到的結果,從解釋API爲什麼es獲得idf值是0.30685282?
{
"took": 5,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 0.13424811,
"hits": [
{
"_shard": 2,
"_node": "Tf1RSzMxQD-AYhmnKQWr8Q",
"_index": "scoretest",
"_type": "test",
"_id": "1",
"_score": 0.13424811,
"_source": {
"content": "this book is about english",
"title": "this is a book"
},
"_explanation": {
"value": 0.13424811,
"description": "weight(content:english in 0) [PerFieldSimilarity], result of:",
"details": [
{
"value": 0.13424811,
"description": "fieldWeight in 0, product of:",
"details": [
{
"value": 1,
"description": "tf(freq=1.0), with freq of:",
"details": [
{
"value": 1,
"description": "termFreq=1.0"
}
]
},
{
"value": 0.30685282,
"description": "idf(docFreq=1, maxDocs=1)"
},
{
"value": 0.4375,
"description": "fieldNorm(doc=0)"
}
]
}
]
}
}
]
}
}
在這裏,我不明白兩點:
1 IDF的分子式是:
public float idf(long docFreq, long numDocs) {
return (float)(Math.log(numDocs/(double)(docFreq+1)) + 1.0);
}
爲什麼docFreq是1和numDocs爲1將idf值爲0.30685282?
log(0.5) = -0.3010299957 + 1.0 = 0.6989700043
2個numDocs是1?
是否numDocs意味着多少文檔在我的索引?我的索引中有2個文檔,爲什麼使用1?
約問題二,請參閱本查詢結果:
{
"took": 17,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 2,
"max_score": 0.13424811,
"hits": [
{
"_shard": 2,
"_node": "Tf1RSzMxQD-AYhmnKQWr8Q",
"_index": "scoretest",
"_type": "test",
"_id": "1",
"_score": 0.13424811,
"_source": {
"content": "this book is about english",
"title": "this is a book"
},
"_explanation": {
"value": 0.13424811,
"description": "weight(content:book in 0) [PerFieldSimilarity], result of:",
"details": [
{
"value": 0.13424811,
"description": "fieldWeight in 0, product of:",
"details": [
{
"value": 1,
"description": "tf(freq=1.0), with freq of:",
"details": [
{
"value": 1,
"description": "termFreq=1.0"
}
]
},
{
"value": 0.30685282,
"description": "idf(docFreq=1, maxDocs=1)"
},
{
"value": 0.4375,
"description": "fieldNorm(doc=0)"
}
]
}
]
}
},
{
"_shard": 3,
"_node": "Tf1RSzMxQD-AYhmnKQWr8Q",
"_index": "scoretest",
"_type": "test",
"_id": "2",
"_score": 0.13424811,
"_source": {
"content": "this book is about chinese",
"title": "this is a book"
},
"_explanation": {
"value": 0.13424811,
"description": "weight(content:book in 0) [PerFieldSimilarity], result of:",
"details": [
{
"value": 0.13424811,
"description": "fieldWeight in 0, product of:",
"details": [
{
"value": 1,
"description": "tf(freq=1.0), with freq of:",
"details": [
{
"value": 1,
"description": "termFreq=1.0"
}
]
},
{
"value": 0.30685282,
"description": "idf(docFreq=1, maxDocs=1)"
},
{
"value": 0.4375,
"description": "fieldNorm(doc=0)"
}
]
}
]
}
}
]
}
}
第一個問題,我得到了答案,但第二個,我叫_flush刷新該文檔時,numDocs仍然是一個 – jianfeng
請參閱問題追加,我打電話查詢時,它返回兩個文件,但numDocs還是1 .... – jianfeng
@jianfeng - 想想我現在看到的是什麼實際問題,我已經相應地編輯了我的答案。 – femtoRgon