2015-11-24 23 views
0

我試圖從CSV提取(由Oracle數據庫表生成)中創建一個具有超過一百萬行的Parquet表。其中大約25行的START_DATE爲空值,CTAS未能將""解釋爲null。任何建議將不勝感激。使用Apache Drill從CSV創建Parquet表時出錯

CREATE TABLE dfs.tmp.FOO as 
select cast(columns[0] as INT) as `PRODUCT_ID`, 
cast(columns[1] as INT) as `LEG_ID`, 
columns[2] as `LEG_TYPE`, 
to_timestamp(columns[3], 'dd-MMM-yy HH.mm.ss.SSSSSS a') as `START_DATE` 
from dfs.`c:\work\prod\data\foo.csv`; 



Error: SYSTEM ERROR: IllegalArgumentException: Invalid format "" 

回答

0

你總是可以包括CASE語句來過濾出空條目:

CREATE TABLE dfs.tmp.FOO as 
select cast(columns[0] as INT) as `PRODUCT_ID`, 
cast(columns[1] as INT) as `LEG_ID`, 
columns[2] as `LEG_TYPE`, 
CASE WHEN columns[3] = '' THEN null 
    ELSE to_timestamp(columns[3], 'dd-MMM-yy HH.mm.ss.SSSSSS a') 
END as `START_DATE` 
from dfs.`c:\work\prod\data\foo.csv`; 
+0

那工作克里斯。謝謝。 – indus73

0

您還可以使用NULLIF()函數如下

CREATE TABLE dfs.tmp.FOO as 
select cast(columns[0] as INT) as `PRODUCT_ID`, 
cast(columns[1] as INT) as `LEG_ID`, 
columns[2] as `LEG_TYPE`, 
to_timestamp(NULLIF(columns[3],''), 'dd-MMM-yy HH.mm.ss.SSSSSS a') as `START_DATE` 
from dfs.`c:\work\prod\data\foo.csv`; 

NULLIF將轉換爲空字符串爲null並且投射不會失敗。