2014-04-13 50 views
0

我試圖找出ElasticSearch在按分數對結果進行排名時使用的邏輯。ElasticSearch評分問題

我一共有4個索引。我正在查詢所有索引的任期。我使用的查詢如下 -

GET /_all/static/_search 
{ 
    "query": { 
    "match": { 
     "name": "chinese" 
    } 
    } 
} 

(部分)響應,我得到的是如下 -

{ 
    "took": 17, 
    "timed_out": false, 
    "_shards": { 
     "total": 40, 
     "successful": 40, 
     "failed": 0 
    }, 
    "hits": { 
     "total": 6, 
     "max_score": 2.96844, 
     "hits": [ 
     { 
      "_shard": 1, 
      "_node": "Hz9L2DZ-ShSajaNvoyU8Eg", 
      "_index": "restaurant", 
      "_type": "static", 
      "_id": "XecLkyYNQWihuR2atFc5JQ", 
      "_score": 2.96844, 
      "_source": { 
       "name": "Just Chinese" 
      }, 
      "_explanation": { 
       "value": 2.96844, 
       "description": "weight(name:chinese in 1) [PerFieldSimilarity], result of:", 
       "details": [ 
        { 
        "value": 2.96844, 
        "description": "fieldWeight in 1, product of:", 
        "details": [ 
         { 
          "value": 1, 
          "description": "tf(freq=1.0), with freq of:", 
          "details": [ 
           { 
           "value": 1, 
           "description": "termFreq=1.0" 
           } 
          ] 
         }, 
         { 
          "value": 4.749504, 
          "description": "idf(docFreq=3, maxDocs=170)" 
         }, 
         { 
          "value": 0.625, 
          "description": "fieldNorm(doc=1)" 
         } 
        ] 
        } 
       ] 
      } 
     }, 
     { 
      "_shard": 1, 
      "_node": "Hz9L2DZ-ShSajaNvoyU8Eg", 
      "_index": "restaurant", 
      "_type": "static", 
      "_id": "IAUpkC55ReySjvl9Xr5MVw", 
      "_score": 2.96844, 
      "_source": { 
       "name": "The Chinese Hut" 
      }, 
      "_explanation": { 
       "value": 2.96844, 
       "description": "weight(name:chinese in 5) [PerFieldSimilarity], result of:", 
       "details": [ 
        { 
        "value": 2.96844, 
        "description": "fieldWeight in 5, product of:", 
        "details": [ 
         { 
          "value": 1, 
          "description": "tf(freq=1.0), with freq of:", 
          "details": [ 
           { 
           "value": 1, 
           "description": "termFreq=1.0" 
           } 
          ] 
         }, 
         { 
          "value": 4.749504, 
          "description": "idf(docFreq=3, maxDocs=170)" 
         }, 
         { 
          "value": 0.625, 
          "description": "fieldNorm(doc=5)" 
         } 
        ] 
        } 
       ] 
      } 
     }, 
     { 
      "_shard": 2, 
      "_node": "Hz9L2DZ-ShSajaNvoyU8Eg", 
      "_index": "cuisine", 
      "_type": "static", 
      "_id": "6", 
      "_score": 2.7047482, 
      "_source": { 
       "name": "Chinese" 
      }, 
      "_explanation": { 
       "value": 2.7047482, 
       "description": "weight(name:chinese in 1) [PerFieldSimilarity], result of:", 
       "details": [ 
        { 
        "value": 2.7047482, 
        "description": "fieldWeight in 1, product of:", 
        "details": [ 
         { 
          "value": 1, 
          "description": "tf(freq=1.0), with freq of:", 
          "details": [ 
           { 
           "value": 1, 
           "description": "termFreq=1.0" 
           } 
          ] 
         }, 
         { 
          "value": 2.7047482, 
          "description": "idf(docFreq=1, maxDocs=11)" 
         }, 
         { 
          "value": 1, 
          "description": "fieldNorm(doc=1)" 
         } 
        ] 
        } 
       ] 
      } 
     }, 

我的問題是 - 我。據瞭解,elasticsearch對待小那麼爲什麼餐廳指數中的「Just Chinese」和「The Chinese Hut」這樣的結果與美食指數的預期最佳匹配「chinese」相比排在前列?據我所知,在將這些文檔插入索引時,我沒有使用任何特殊的分析器或任何東西。一切都是默認的。

我錯過了什麼,如何獲得預期的結果?

回答

2

計算得分的重要參數之一是inverse document frequency(IDF)。默認情況下,elasticsearch的每個分片嘗試根據本地IDF估計全局IDF。它有很多類似的記錄均勻分佈在分片中。但是,如果您只有幾條記錄,或者將多個碎片的結果與不同類型的記錄(餐廳名稱和餐廳名稱)結合起來,估計IDF可能會產生奇怪的結果。此問題的解決方案是使用彈性搜索的dfs_query_then_fetch搜索模式。

順便說一下,爲了解彈性搜索如何計算得分,您可以在搜索請求或網址中使用explain參數。因此,當您提出關於評分的問題時,當您提供解釋設置爲true的輸出時,這會有所幫助。

+0

dfs_query_then_fetch工作!現在我也明白爲什麼它會這樣工作!感謝您的解釋! 另外,我編輯了回覆以包含原始回覆的解釋。 – arijeet