如何在s3中從csv文件讀取數據並在aws athena中創建表格時跳過標題。

我正在嘗試讀取s3存儲桶中的csv數據並在AWS Athena中創建表。我的表創建時無法跳過我的CSV文件的標題信息。如何在s3中從csv文件讀取數據並在aws athena中創建表格時跳過標題。

查詢示例：

CREATE EXTERNAL TABLE IF NOT EXISTS table_name ( `event_type_id` 
    string, `customer_id` string, `date` string, `email` string) 
    ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde' 
    WITH 
    SERDEPROPERTIES ( "separatorChar" = "|", "quoteChar"  = "\"") 
    LOCATION 's3://location/' 
    TBLPROPERTIES ("skip.header.line.count"="1");

skip.header.line.count似乎並沒有工作。但這並不奏效。我認爲Aws在這方面有一些問題。是否有其他方法可以解決這個問題？

來源

2017-08-03 Dinesh Kumar Paladhi

這是紅移是什麼在起作用：

你想用table properties ('skip.header.line.count'='1') 隨着其他的屬性，如果你想，例如'numRows'='100'。下面是一個示例：

create external table exreddb1.test_table 
(ID BIGINT 
,NAME VARCHAR 
) 
row format delimited 
fields terminated by ',' 
stored as textfile 
location 's3://mybucket/myfolder/' 
table properties ('numRows'='100', 'skip.header.line.count'='1');

來源

2017-12-08 22:52:39 TheWalkingData

這裏的AWS紅移SQL文件上的「創建外部表」，http://docs.aws.amazon.com/redshift/latest/dg/r_CREATE_EXTERNAL_TABLE.html – TheWalkingData

這是一個已知的缺陷。

我見過的最好的方法是tweeted by Eric Hammond：

...WHERE date NOT LIKE '#%'

這似乎是在查詢過程中跳過標題行。我不確定它是如何工作的，但它可能是一種跳過NULL的方法。

來源

2017-08-03 23:11:57

如何在s3中從csv文件讀取數據並在aws athena中創建表格時跳過標題。

回答

相關問題