如果這樣的文件存在，如何跳過索引文件？

我使用Spark將大量數據寫入Elasticsearch。但是其中一些（有時是大部分）是在這種情況下具有相同ID的重複文件。由於將數據寫入ES需要很長時間，我想知道如果文檔的ID已經存在於ES中如何跳過索引？如果這樣的文件存在，如何跳過索引文件？

喜歡：

if doc.id in ES: 
    continue 
else 
    doc.index(ES)

來源

2017-02-21 Mazz

我不知道如何與火花的作品連接，但在ES你可以設置operation type。

PUT twitter/tweet/1?op_type=create 
{ 
    "user" : "kimchy", 
    "post_date" : "2009-11-15T14:12:12", 
    "message" : "trying out Elasticsearch" 
}

但唯一的問題

When create is used, the index operation will fail if a document by that id already exists in the index.

來源

2017-02-21 03:55:54

非常感謝你，「唯一的問題」是真的對我來說是很大的問題。你知道任何方法來抑制異常嗎？ – Mazz

@Mazz我會尋找方式如何在客戶端壓縮錯誤，因爲ES只會返回特定的json正文。 –

如果這樣的文件存在，如何跳過索引文件？

回答

相關問題