如何刪除elasticsearch中以前索引的文檔？

我有兩個文件索引像下面文件1如何刪除elasticsearch中以前索引的文檔？

{ 
    "_index": "custom-design", 
    "_type": "cars", 
    "_id": "porche129", 
    "_score": 1.2413527, 
    "_source": { 
    "clientID": "ps1233443", 
    "customisation": "yes", 
    "userType": "heavy", 
    "totalBilling": 3000 
    } 
}

}

文獻2

{ 
    "_index": "custom-design", 
    "_type": "cars", 
    "_id": "porche232", 
    "_score": 1.2413527, 
    "_source": { 
    "clientID": "ps1233443", 
    "customisation": "yes", 
    "userType": "heavy", 
    "totalBilling": 3000 
    } 
} 
}

正如你可以看到這兩個文件編制索引，並有不同的ID，但相同的內容。是否可以在索引後檢測並清除重複的文檔？

來源

2016-01-14 antony cena

理想情況下，您需要爲每個文檔創建散列哈希。但是，那麼現在不可能，讓我們用腳本來做到這一點。

curl -XGET 'http://localhost:9200/Index/IndexType/_search?pretty=true' -d '{ 
    "size": 0, 
    "aggs": { 
    "duplicateCount": "terms": { 
     "script": "doc['clientID'].value + doc['customisation'].value+doc['userType'].value+doc['totalBilling'].value", 
     "min_doc_count": 2 
    },  
    "aggs": { 
     "duplicateDocuments": { 
     "top_hits": {} 
     } 
    } 
    } 
}'

如果你看看結果，你可以在這裏看到重複的文件。現在找到重複的ID並進行批量刪除。

你可以閱讀更多關於這些方法在這裏 - https://qbox.io/blog/minimizing-document-duplication-in-elasticsearch

來源

2016-02-21 05:12:43

如何刪除elasticsearch中以前索引的文檔？

回答

相關問題