AWS Datapipeline：從RDS數據庫（postgres的）移動數據使用管道

基本上我試圖將數據從postgres的傳輸使用AWS datapipeline並且處理向紅移我下面AWS Datapipeline：從RDS數據庫（postgres的）移動數據使用管道

寫管道（CopyActivity）到紅移從postgres to s3
移動數據寫管道（RedShiftCopyActivity）從s3 to redshift

在我的情況

因此，無論與我寫的流水線工作完美移動數據，但問題是數據是DUPLI cating在紅移數據庫

例如下面是從Postgres數據庫表中的數據叫company

s3 to redshift(RedShiftCopyActivity)管道中的數據被複制的成功運行後，但它被複製爲以下

下面

是一些定義部分從RedShiftCopyActivity（S3至Redshif t）的管道

pipeline_definition = [{ 
     "id":"redshift_database_instance_output", 
     "name":"redshift_database_instance_output", 
     "fields":[ 
      { 
      "key" : "database", 
      "refValue" : "RedshiftDatabaseId_S34X5", 
      }, 
      { 
      "key" : "primaryKeys", 
      "stringValue" : "id", 
      }, 
      { 
      "key" : "type", 
      "stringValue" : "RedshiftDataNode", 
      }, 
      { 
      "key" : "tableName", 
      "stringValue" : "company", 
      }, 
      { 
      "key" : "schedule", 
      "refValue" : "DefaultScheduleTime", 
      }, 
      { 
      "key" : "schemaName", 
      "stringValue" : RedShiftSchemaName, 
      }, 
     ] 
    }, 
    { 
     "id":"CopyS3ToRedshift", 
     "name":"CopyS3ToRedshift", 
     "fields":[ 
      { 
      "key" : "output", 
      "refValue" : "redshift_database_instance_output", 
      }, 
      { 
      "key" : "input", 
      "refValue" : "s3_input_data", 
      }, 
      { 
      "key" : "runsOn", 
      "refValue" : "ResourceId_z9RNH", 
      }, 
      { 
      "key" : "type", 
      "stringValue" : "RedshiftCopyActivity", 
      }, 
      { 
      "key" : "insertMode", 
      "stringValue" : "KEEP_EXISTING", 
      }, 
      { 
      "key" : "schedule", 
      "refValue" : "DefaultScheduleTime", 
      }, 
     ] 
    },]

所以根據RedShitCopyActivity的，我們需要使用insertMode來描述數據應該如何表現（在文檔插入/更新/刪除）複製到數據庫表的時候，如下

insertMode : Determines what AWS Data Pipeline does with pre-existing data in the target table that overlaps with rows in the data to be loaded. Valid values are KEEP_EXISTING, OVERWRITE_EXISTING, TRUNCATE and APPEND. KEEP_EXISTING adds new rows to the table, while leaving any existing rows unmodified. KEEP_EXISTING and OVERWRITE_EXISTING use the primary key, sort, and distribution keys to identify which incoming rows to match with existing rows, according to the information provided in Updating and inserting new data in the Amazon Redshift Database Developer Guide. TRUNCATE deletes all the data in the destination table before writing the new data. APPEND will add all records to the end of the Redshift table. APPEND does not require a primary, distribution key, or sort key so items that may be potential duplicates may be appended.

那麼我的要求是