0
我有一個外部Hive表指向通過S3上的Spark作業寫入的Parquet文件,它具有日期,時間戳字段,當我通過配置單元查詢時,我得到正確的日期Timestamp字段在EMR上顯示1970-01-01
CREATE EXTERNAL TABLE events(
event_date date,
event_timestamp timestamp,
event_name string,
event_category string
PARTITIONED BY (
dateid int,
STORED AS PARQUET
LOCATION 's3a://somebucket/events'
hive> SELECT event_timestamp, event_date from events limit 10;
2017-01-02 13:40:23 2017-01-02
2017-01-02 13:40:23.013 2017-01-02
2017-01-02 13:40:23.419 2017-01-02
2017-01-02 18:51:57.637 2017-01-02
2017-01-02 18:52:03.512 2017-01-02
2017-01-02 18:52:03.769 2017-01-02
2017-01-02 18:52:30.945 2017-01-02
2017-01-02 18:52:32.757 2017-01-02
2017-01-02 18:52:37.083 2017-01-02
2017-01-02 18:52:38.099 2017-01-02
然而,當我通過急運行在EMR集羣版本(EMR-5.6.0),我看到所有日期爲1970-01-01
presto-cli --catalog hive --schema default
presto:default> SELECT event_timestamp, event_date from events limit 10;
1970-01-01 00:00:17.197 | 1970-01-01
1970-01-01 00:00:17.197 | 1970-01-01
1970-01-01 00:00:17.197 | 1970-01-01
1970-01-01 00:00:17.197 | 1970-01-01
1970-01-01 00:00:17.197 | 1970-01-01
1970-01-01 00:00:17.197 | 1970-01-01
1970-01-01 00:00:17.197 | 1970-01-01
1970-01-01 00:00:17.197 | 1970-01-01
1970-01-01 00:00:17.197 | 1970-01-01
1970-01-01 00:00:17.197 | 1970-01-01
運行(0.170版) Hive中的時間戳字段是否存在與Parquet通過Presto查詢的問題?
您可以在Hive連接器配置中嘗試'hive.parquet.use-column-names = true',以使Presto按名稱匹配列。 –