檢索ElasticSearch中所有_ids的有效方法

從ElasticSearch獲取某個索引的所有_ids的最快方法是什麼？可以通過使用簡單的查詢嗎？我的一個索引有大約20,000個文檔。檢索ElasticSearch中所有_ids的有效方法

2013-07-05 Mahoni

我找到[this]（https：// github。 com/elastic/elasticsearch/issues/17159）非常有幫助。 – shellbye

編輯：請閱讀@Aleck蘭德格拉夫的回答，太

你只想elasticsearch內部_id場？或從你的文件中的id字段？

對於前者，儘量

curl http://localhost:9200/index/type/_search?pretty=true -d ' 
{ 
    "query" : { 
     "match_all" : {} 
    }, 
    "stored_fields": [] 
} 
'

注意2017年更新：員額最初包括"fields": []但此後名稱已更改，stored_fields是新的價值。

結果將只包含您的文檔

{ 
    "took" : 7, 
    "timed_out" : false, 
    "_shards" : { 
    "total" : 5, 
    "successful" : 5, 
    "failed" : 0 
    }, 
    "hits" : { 
    "total" : 4, 
    "max_score" : 1.0, 
    "hits" : [ { 
     "_index" : "index", 
     "_type" : "type", 
     "_id" : "36", 
     "_score" : 1.0 
    }, { 
     "_index" : "index", 
     "_type" : "type", 
     "_id" : "38", 
     "_score" : 1.0 
    }, { 
     "_index" : "index", 
     "_type" : "type", 
     "_id" : "39", 
     "_score" : 1.0 
    }, { 
     "_index" : "index", 
     "_type" : "type", 
     "_id" : "34", 
     "_score" : 1.0 
    } ] 
    } 
}

對於後者的「元數據」，如果你想包括您的文檔場，簡單地把它添加到fields陣列

curl http://localhost:9200/index/type/_search?pretty=true -d ' 
{ 
    "query" : { 
     "match_all" : {} 
    }, 
    "fields": ["document_field_to_be_returned"] 
} 
'

來源

2013-07-05 22:07:28 Thorsten

這不會只返回10個結果嗎？ –

做直接查詢不是最有效的方法。當您執行查詢時，必須在返回之前對所有結果進行排序。在下面的迴應中提到的滾動和掃描將更有效率，因爲它在返回之前不對結果集進行排序。 FALSE' PARAM： – aamiri

不中5.x中，現場'fields'已被刪除，取而代之的是，加上' 「_source」工作了。彈性搜索不再支持此查詢中支持 –

另一種選擇

curl 'http://localhost:9200/index/type/_search?pretty=true&fields='

將返回_index，_type，_ id和_score。

來源

2014-08-18 06:43:44

-1不如訪問不僅僅是幾個文件，更多的時候使用的掃描和滾動。這是一個「快捷方式」來做到這一點，但不會表現良好，也可能會失敗，在大指數 – PhaedrusTheGreek

在6.2「的請求......包含無法識別的參數：[田]」 –

你也可以做到這一點的蟒蛇，它給你一個正確的列表：

import elasticsearch 
es = elasticsearch.Elasticsearch() 

res = es.search(
    index=your_index, 
    body={"query": {"match_all": {}}, "size": 30000, "fields": ["_id"]}) 

ids = [d['_id'] for d in res['hits']['hits']]

來源

2015-05-28 07:24:19

最好使用scroll and scan得到結果列表，以便elasticsearch沒有排名和結果進行排序。

from elasticsearch import Elasticsearch 
from elasticsearch_dsl import Search 

es = Elasticsearch() 
s = Search(using=es, index=ES_INDEX, doc_type=DOC_TYPE) 

s = s.fields([]) # only get ids, otherwise `fields` takes a list of field names 
ids = [h.meta.id for h in s.scan()]

控制檯日誌：

隨着elasticsearch-dsl蟒蛇LIB這可以通過以下方式實現

GET http://localhost:9200/my_index/my_doc/_search?search_type=scan&scroll=5m [status:200 request:0.003s] 
GET http://localhost:9200/_search/scroll?scroll=5m [status:200 request:0.005s] 
GET http://localhost:9200/_search/scroll?scroll=5m [status:200 request:0.005s] 
GET http://localhost:9200/_search/scroll?scroll=5m [status:200 request:0.003s] 
GET http://localhost:9200/_search/scroll?scroll=5m [status:200 request:0.005s] 
...

注：滾動拉從一個查詢結果的批次，並保持光標打開一段時間（1分鐘，2分鐘，您可以更新）; 掃描禁用排序。 scan輔助函數返回一個可以安全地遍歷的python生成器。

來源

2015-06-15 21:57:48

方法'fields'已被刪除版本'5.0.0'（參見：https://elasticsearch-dsl.readthedocs.io/en/latest/Changelog.html?highlight=fields(#id2）您現在應該使用'S = s.source（[]。。）' – illagrenan

給定的鏈路不可用顯示404 –

SEARCH_TYPE = 2.1自掃描棄用（[https://www.elastic.co/guide/en/elasticsearch/reference/2.1/breaking_21_search_changes.html](https： //www.elastic.co/guide/en/elasticsearch/reference/2.1/breaking_21_search_changes.html）），用於將錯誤文本 – aleha

通過@亞力克 - 蘭德格拉夫回答啓發，對我來說，工作在標準elasticsearch的Python API直接使用scan功能：

from elasticsearch import Elasticsearch 
from elasticsearch.helpers import scan 
es = Elasticsearch() 
for dobj in scan(es, 
       query={"query": {"match_all": {}}, "fields" : []}, 
       index="your-index-name", doc_type="your-doc-type"): 
     print dobj["_id"],

來源

2016-01-16 22:39:47

在闡述的2個答案通過@羅伯特 - 路約和@亞力克 - 蘭德格拉夫（有人用權限可能欣然移動這評論）：如果你不希望打印卻得到了一個列表內一切從返回的發電機，這裏是我使用：

from elasticsearch import Elasticsearch,helpers 
es = Elasticsearch(hosts=[YOUR_ES_HOST]) 
a=helpers.scan(es,query={"query":{"match_all": {}}},scroll='1m',index=INDEX_NAME)#like others so far 

IDs=[aa['_id'] for aa in a]

來源

2016-02-10 17:16:31

-1

Url -> http://localhost:9200/<index>/<type>/_query 
http method -> DELETE 
Query -> {"query": {"match_all": {}}, "size": 30000, "fields": ["_id"]})

來源

2016-10-04 08:47:50

對於elasticsearch 5.x，可以使用「_source」字段。

GET /_search 
{ 
    "_source": false, 
    "query" : { 
     "term" : { "user" : "kimchy" } 
    } 
}

"fields"已被棄用。（錯誤：「字段[字段]不再支持，請使用[stored_fields]檢索存儲字段或_source篩選如果字段未存儲」）

來源

2016-11-14 04:25:52 Nav

獎勵積分。Elasticsearch錯誤信息大多似乎不是很googlable :( – AmericanUmlaut

檢索ElasticSearch中所有_ids的有效方法

回答

相關問題