2015-09-21 80 views
0

我得到的結果,從解釋API爲什麼es獲得idf值是0.30685282?

{ 
    "took": 5, 
    "timed_out": false, 
    "_shards": { 
     "total": 5, 
     "successful": 5, 
     "failed": 0 
    }, 
    "hits": { 
     "total": 1, 
     "max_score": 0.13424811, 
     "hits": [ 
      { 
       "_shard": 2, 
       "_node": "Tf1RSzMxQD-AYhmnKQWr8Q", 
       "_index": "scoretest", 
       "_type": "test", 
       "_id": "1", 
       "_score": 0.13424811, 
       "_source": { 
        "content": "this book is about english", 
        "title": "this is a book" 
       }, 
       "_explanation": { 
        "value": 0.13424811, 
        "description": "weight(content:english in 0) [PerFieldSimilarity], result of:", 
        "details": [ 
         { 
          "value": 0.13424811, 
          "description": "fieldWeight in 0, product of:", 
          "details": [ 
           { 
            "value": 1, 
            "description": "tf(freq=1.0), with freq of:", 
            "details": [ 
             { 
              "value": 1, 
              "description": "termFreq=1.0" 
             } 
            ] 
           }, 
           { 
            "value": 0.30685282, 
            "description": "idf(docFreq=1, maxDocs=1)" 
           }, 
           { 
            "value": 0.4375, 
            "description": "fieldNorm(doc=0)" 
           } 
          ] 
         } 
        ] 
       } 
      } 
     ] 
    } 
} 

在這裏,我不明白兩點:

1 IDF的分子式是:

public float idf(long docFreq, long numDocs) { 
    return (float)(Math.log(numDocs/(double)(docFreq+1)) + 1.0); 
    } 

爲什麼docFreq是1和numDocs爲1將idf值爲0.30685282?

log(0.5) = -0.3010299957 + 1.0 = 0.6989700043 

2個numDocs是1?

是否numDocs意味着多少文檔在我的索引?我的索引中有2個文檔,爲什麼使用1?

約問題二,請參閱本查詢結果:

{ 
    "took": 17, 
    "timed_out": false, 
    "_shards": { 
     "total": 5, 
     "successful": 5, 
     "failed": 0 
    }, 
    "hits": { 
     "total": 2, 
     "max_score": 0.13424811, 
     "hits": [ 
      { 
       "_shard": 2, 
       "_node": "Tf1RSzMxQD-AYhmnKQWr8Q", 
       "_index": "scoretest", 
       "_type": "test", 
       "_id": "1", 
       "_score": 0.13424811, 
       "_source": { 
        "content": "this book is about english", 
        "title": "this is a book" 
       }, 
       "_explanation": { 
        "value": 0.13424811, 
        "description": "weight(content:book in 0) [PerFieldSimilarity], result of:", 
        "details": [ 
         { 
          "value": 0.13424811, 
          "description": "fieldWeight in 0, product of:", 
          "details": [ 
           { 
            "value": 1, 
            "description": "tf(freq=1.0), with freq of:", 
            "details": [ 
             { 
              "value": 1, 
              "description": "termFreq=1.0" 
             } 
            ] 
           }, 
           { 
            "value": 0.30685282, 
            "description": "idf(docFreq=1, maxDocs=1)" 
           }, 
           { 
            "value": 0.4375, 
            "description": "fieldNorm(doc=0)" 
           } 
          ] 
         } 
        ] 
       } 
      }, 
      { 
       "_shard": 3, 
       "_node": "Tf1RSzMxQD-AYhmnKQWr8Q", 
       "_index": "scoretest", 
       "_type": "test", 
       "_id": "2", 
       "_score": 0.13424811, 
       "_source": { 
        "content": "this book is about chinese", 
        "title": "this is a book" 
       }, 
       "_explanation": { 
        "value": 0.13424811, 
        "description": "weight(content:book in 0) [PerFieldSimilarity], result of:", 
        "details": [ 
         { 
          "value": 0.13424811, 
          "description": "fieldWeight in 0, product of:", 
          "details": [ 
           { 
            "value": 1, 
            "description": "tf(freq=1.0), with freq of:", 
            "details": [ 
             { 
              "value": 1, 
              "description": "termFreq=1.0" 
             } 
            ] 
           }, 
           { 
            "value": 0.30685282, 
            "description": "idf(docFreq=1, maxDocs=1)" 
           }, 
           { 
            "value": 0.4375, 
            "description": "fieldNorm(doc=0)" 
           } 
          ] 
         } 
        ] 
       } 
      } 
     ] 
    } 
} 

回答

1
  1. 自然對數,而不是基座10 1 + LN(1 /(1 + 1))= 0.30685282

  2. 是,它是索引中的文檔數量。但是,索引中的文檔似乎與分片不同,它們實際上是單獨的索引,至少就計分的文檔計數而言。您可以在Jeroen van Wilgenburg的博客上閱讀更多關於此的內容:How sharding in elasticsearch makes scoring a little less accurate and what to do about it。我認爲在他的結論中強調了一條線:「更大的集合,分數差異將會收斂。」

+0

第一個問題,我得到了答案,但第二個,我叫_flush刷新該文檔時,numDocs仍然是一個 – jianfeng

+0

請參閱問題追加,我打電話查詢時,它返回兩個文件,但numDocs還是1 .... – jianfeng

+0

@jianfeng - 想想我現在看到的是什麼實際問題,我已經相應地編輯了我的答案。 – femtoRgon

相關問題