2017-09-23 99 views
0

這裏是我的ES查詢:獲取聚集

===創建索引===

PUT /sample 

===插入數據===

PUT /sample/docs/1 
{"data": "And the world said, 'Disarm, disclose, or face serious consequences'—and therefore, we worked with the world, we worked to make sure that Saddam Hussein heard the message of the world."} 
PUT /sample/docs/2 
{"data": "Never give in — never, never, never, never, in nothing great or small, large or petty, never give in except to convictions of honour and good sense. Never yield to force; never yield to the apparently overwhelming might of the enemy"} 

===查詢,得到的結果===

POST sample/docs/_search 
{ 
    "query": { 
    "match": { 
     "data": "never" 
    } 
    }, 
    "highlight": { 
    "fields": { 
     "data":{} 
    } 
    } 
} 

===檢索結果===

... 
     "highlight": { 
      "data": [ 
      "<em>Never</em> give in — <em>never</em>, <em>never</em>, <em>never</em>, <em>never</em>, in nothing great or small, large or petty, <em>never</em> give", 
      " in except to convictions of honour and good sense. <em>Never</em> yield to force; <em>never</em> yield to the apparently overwhelming might of the enemy" 
      ] 
     } 

===所需的結果===

所需期限由文件 搜索詞的頻率如下例所示

Doc Id: 2 
Term Frequency :{ 
    "never": 8 
} 

我已經試過桶聚合,術語聚合和其他聚合,但我沒有得到這個結果。

感謝您的幫助!

回答

0

您應該使用Term Vector,它支持根據頻率查詢特定的術語。

https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-termvectors.html

在這種情況下,您的查詢將

GET /sample/docs/_termvectors 
{ 
    "doc": { 
     "data": "never" 
    }, 
    "term_statistics" : true, 
    "field_statistics" : true, 
    "positions": false, 
    "offsets": false, 
    "filter" : { 
     "min_term_freq" : 8 
    } 
} 
+0

我越來越如果我執行你的建議的查詢以下錯誤: '{ 「錯誤」:{ 「ROOT_CAUSE」: [ { 「type」:「illegal_state_exception」, 「reason」:「術語向量請求的字段統計信息存在錯誤:值爲\ nsum_doc_freq 0 \ ndoc_count 0 \ nsum_ttf 0」 } ], 「類型」: 「illegal_state_exception」, 「原因」: 「出毛病與術語載體請求的字段統計:此數值\ nsum_doc_freq 0 \ ndoc_count 0 \ nsum_ttf 0」 }, 「狀態」 :500 }' – Callisto

+0

而我的需求是不同的,根據您的建議查詢它將返回結果與術語頻率8,但我想要的結果是術語頻率的數量。 – Callisto