用Elasticsearch查詢字段的所有唯一值

18

您可以在'full_name'字段中輸入terms facet。但爲了正確地做到這一點，您需要確保在索引時不要標記它，否則該方面中的每個條目都將成爲字段內容的一部分。您很可能需要在您的映射中將其配置爲'not_analyzed'。如果您也在搜索它並且您仍然想要標記它，您可以使用multi field以兩種不同的方式對其進行索引。

您還需要考慮到，根據作爲full_name字段一部分的特殊術語的數量，此操作可能很昂貴並且需要相當多的內存。

來源

2013-01-23 12:04:26 javanna

12

對於Elasticsearch 1.0和更高版本，您可以利用terms aggregation要做到這一點，

查詢DSL：

{ 
    "aggs": { 
    "NAME": { 
     "terms": { 
     "field": "", 
     "size": 10 
     } 
    } 
    } 
}

一個真實的例子：

{ 
    "aggs": { 
    "full_name": { 
     "terms": { 
     "field": "authors", 
     "size": 0 
     } 
    } 
    } 
}

然後你就可以得到所有唯一值的authors字段。 size = 0表示不限制術語的數量（這要求es爲1.1.0或更高版本）。

響應：

{ 
    ... 

    "aggregations" : { 
     "full_name" : { 
      "buckets" : [ 
       { 
        "key" : "Ken", 
        "doc_count" : 10 
       }, 
       { 
        "key" : "Jim Gray", 
        "doc_count" : 10 
       }, 
      ] 
     } 
    } 
}

看到Elasticsearch terms aggregations。

來源

2014-10-30 07:28:21

+0

是什麼意思FULL_NAME？ – neustart47

+2

@ neustart47 full_name只是聚合的名稱 –

4

現有的答案並沒有爲我在Elasticsearch 5.X工作，有以下原因：

我需要我的記號化，而索引輸入。
"size": 0未能解析，因爲「[size]必須大於0」。
"Fielddata is disabled on text fields by default."這意味着默認情況下，您不能搜索full_name字段。但是，未分析的keyword字段可用於聚合。

解決方案1 ：使用Scroll API。它通過保持搜索上下文併發出多個請求來工作，每次返回後續批次的結果。如果您正在使用Python，那麼elasticsearch模塊會有scan() helper function來處理您的滾動並返回所有結果。使用Search After API。它與Scroll類似，但提供了一個實時光標而不是保留搜索上下文。因此它對實時請求更有效。

來源

2017-02-20 19:11:07

+0

我不確定這適用於「大小」：0問題，因爲默認情況下，據我所知，從文檔是10 ... – Trejkaz

+0

@Trejkaz謝謝;我已經更新了我的答案。 –

0

適用於Elasticsearch 5.2。2

curl -XGET http://localhost:9200/articles/_search?pretty -d ' 
{ 
    "aggs" : { 
     "whatever" : { 
      "terms" : { "field" : "yourfield", "size":10000 } 
     } 
    }, 
    "size" : 0 
}'

的"size":10000手段獲得（最多）10000個獨特的價值觀。如果沒有這個，如果有超過10個唯一值，則只返回10個值。

"size":0表示結果"hits"將不包含任何文檔。默認情況下，返回10個文件，我們不需要。

參考：bucket terms aggregation

還要注意，根據this page，小面已被取代在Elasticsearch 1.0聚合，這是刻面的超集。

來源

2017-12-01 22:31:04 sam

0

直覺： 在SQL的說法：

Select distinct full_name from authors;

相當於

Select full_name from authors group by full_name;

所以，我們可以使用分組/聚合語法ElasticSearch找到不同的項。

假設以下是存儲在彈性搜索結構：

[{ 
    "author": "Brian Kernighan" 
    }, 
    { 
    "author": "Charles Dickens" 
    }]

什麼不工作：平原聚集

{ 
    "aggs": { 
    "full_name": { 
     "terms": { 
     "field": "author" 
     } 
    } 
    } 
}

我得到了以下錯誤：

{ 
    "error": { 
    "root_cause": [ 
     { 
     "reason": "Fielddata is disabled on text fields by default...", 
     "type": "illegal_argument_exception" 
     } 
    ] 
    } 
}

什麼工作就像一個魅力：追加.keyword與現場

{ 
    "aggs": { 
    "full_name": { 
     "terms": { 
     "field": "author.keyword" 
     } 
    } 
    } 
}

和樣品輸出可能是：

{ 
    "aggregations": { 
    "full_name": { 
     "buckets": [ 
     { 
      "doc_count": 372, 
      "key": "Charles Dickens" 
     }, 
     { 
      "doc_count": 283, 
      "key": "Brian Kernighan" 
     } 
     ], 
     "doc_count": 1000 
    } 
    } 
}

特別提示：

讓我們假設在que領域Stion的嵌套如下：

[{ 
    "authors": [{ 
     "details": [{ 
      "name": "Brian Kernighan" 
      }] 
     }] 
    }, 
    { 
    "authors": [{ 
     "details": [{ 
      "name": "Charles Dickens" 
      }] 
     }] 
    } 
]

現在正確的查詢變爲：

{ 
    "aggregations": { 
    "full_name": { 
     "aggregations": { 
     "author_details": { 
      "terms": { 
      "field": "authors.details.name" 
      } 
     } 
     }, 
     "nested": { 
     "path": "authors.details" 
     } 
    } 
    }, 
    "size": 0 
}

來源

2018-02-28 13:12:05

用Elasticsearch查詢字段的所有唯一值

回答

相關問題