爲什麼es獲得idf值是0.30685282？

我得到的結果，從解釋API爲什麼es獲得idf值是0.30685282？

{ 
    "took": 5, 
    "timed_out": false, 
    "_shards": { 
     "total": 5, 
     "successful": 5, 
     "failed": 0 
    }, 
    "hits": { 
     "total": 1, 
     "max_score": 0.13424811, 
     "hits": [ 
      { 
       "_shard": 2, 
       "_node": "Tf1RSzMxQD-AYhmnKQWr8Q", 
       "_index": "scoretest", 
       "_type": "test", 
       "_id": "1", 
       "_score": 0.13424811, 
       "_source": { 
        "content": "this book is about english", 
        "title": "this is a book" 
       }, 
       "_explanation": { 
        "value": 0.13424811, 
        "description": "weight(content:english in 0) [PerFieldSimilarity], result of:", 
        "details": [ 
         { 
          "value": 0.13424811, 
          "description": "fieldWeight in 0, product of:", 
          "details": [ 
           { 
            "value": 1, 
            "description": "tf(freq=1.0), with freq of:", 
            "details": [ 
             { 
              "value": 1, 
              "description": "termFreq=1.0" 
             } 
            ] 
           }, 
           { 
            "value": 0.30685282, 
            "description": "idf(docFreq=1, maxDocs=1)" 
           }, 
           { 
            "value": 0.4375, 
            "description": "fieldNorm(doc=0)" 
           } 
          ] 
         } 
        ] 
       } 
      } 
     ] 
    } 
}

在這裏，我不明白兩點：

1 IDF的分子式是：

public float idf(long docFreq, long numDocs) { 
    return (float)(Math.log(numDocs/(double)(docFreq+1)) + 1.0); 
    }

爲什麼docFreq是1和numDocs爲1將idf值爲0.30685282？

log(0.5) = -0.3010299957 + 1.0 = 0.6989700043

2個numDocs是1？

是否numDocs意味着多少文檔在我的索引？我的索引中有2個文檔，爲什麼使用1？

約問題二，請參閱本查詢結果：

{ 
    "took": 17, 
    "timed_out": false, 
    "_shards": { 
     "total": 5, 
     "successful": 5, 
     "failed": 0 
    }, 
    "hits": { 
     "total": 2, 
     "max_score": 0.13424811, 
     "hits": [ 
      { 
       "_shard": 2, 
       "_node": "Tf1RSzMxQD-AYhmnKQWr8Q", 
       "_index": "scoretest", 
       "_type": "test", 
       "_id": "1", 
       "_score": 0.13424811, 
       "_source": { 
        "content": "this book is about english", 
        "title": "this is a book" 
       }, 
       "_explanation": { 
        "value": 0.13424811, 
        "description": "weight(content:book in 0) [PerFieldSimilarity], result of:", 
        "details": [ 
         { 
          "value": 0.13424811, 
          "description": "fieldWeight in 0, product of:", 
          "details": [ 
           { 
            "value": 1, 
            "description": "tf(freq=1.0), with freq of:", 
            "details": [ 
             { 
              "value": 1, 
              "description": "termFreq=1.0" 
             } 
            ] 
           }, 
           { 
            "value": 0.30685282, 
            "description": "idf(docFreq=1, maxDocs=1)" 
           }, 
           { 
            "value": 0.4375, 
            "description": "fieldNorm(doc=0)" 
           } 
          ] 
         } 
        ] 
       } 
      }, 
      { 
       "_shard": 3, 
       "_node": "Tf1RSzMxQD-AYhmnKQWr8Q", 
       "_index": "scoretest", 
       "_type": "test", 
       "_id": "2", 
       "_score": 0.13424811, 
       "_source": { 
        "content": "this book is about chinese", 
        "title": "this is a book" 
       }, 
       "_explanation": { 
        "value": 0.13424811, 
        "description": "weight(content:book in 0) [PerFieldSimilarity], result of:", 
        "details": [ 
         { 
          "value": 0.13424811, 
          "description": "fieldWeight in 0, product of:", 
          "details": [ 
           { 
            "value": 1, 
            "description": "tf(freq=1.0), with freq of:", 
            "details": [ 
             { 
              "value": 1, 
              "description": "termFreq=1.0" 
             } 
            ] 
           }, 
           { 
            "value": 0.30685282, 
            "description": "idf(docFreq=1, maxDocs=1)" 
           }, 
           { 
            "value": 0.4375, 
            "description": "fieldNorm(doc=0)" 
           } 
          ] 
         } 
        ] 
       } 
      } 
     ] 
    } 
}

來源

2015-09-21 jianfeng

自然對數，而不是基座10 1 + LN（1 /（1 + 1））= 0.30685282
是，它是索引中的文檔數量。但是，索引中的文檔似乎與分片不同，它們實際上是單獨的索引，至少就計分的文檔計數而言。您可以在Jeroen van Wilgenburg的博客上閱讀更多關於此的內容：How sharding in elasticsearch makes scoring a little less accurate and what to do about it。我認爲在他的結論中強調了一條線：「更大的集合，分數差異將會收斂。」

來源

2015-09-21 05:05:28 femtoRgon

第一個問題，我得到了答案，但第二個，我叫_flush刷新該文檔時，numDocs仍然是一個 – jianfeng

請參閱問題追加，我打電話查詢時，它返回兩個文件，但numDocs還是1 .... – jianfeng

@jianfeng - 想想我現在看到的是什麼實際問題，我已經相應地編輯了我的答案。 – femtoRgon

爲什麼es獲得idf值是0.30685282？

回答

相關問題