2016-07-15 61 views
0

Hil!使用特殊字符從txt創建表配置單元

我有以下內容的TXT:

$ hdfs dfs -cat result/ 
[5,AA,ABE,US,AGU,MX,DNE0M0Z1,99991231,20160421,MX13,706,1,,33,,BOX,,,60,INNJ,31,2419221] 
[5,AA,ABE,US,AGU,MX,DNE0M0Z1,99991231,20160421,MX13,706,1,,33,,BOX,,,60,INNJ,31,2419244] 
[5,AA,ABE,US,AGU,MX,DNE0M0Z1,99991231,20160421,MX13,706,1,,33,,BOX,,,60,INNJ,31,2419319] 

由Spark在HDFS生成此文件。我想要的是創建一個從該文件中讀取的表HIVE,並將結果顯示在表中。問題是記錄以[]開頭和結尾。因爲我可以在不更改txt的情況下執行此操作,因爲它是自動生成的?

現在我的表是:

DROP TABLE IF EXISTS RESULT_LATAM; 

CREATE EXTERNAL TABLE IF NOT EXISTS RESULT_LATAM 
(
    FARDET_NUM_RULE_TARIFF  BIGINT, 
    FARDET_CD_CARRIER   VARCHAR(3), 
    FARDET_CD_ORIGIN_CITY  VARCHAR(5), 
    FARDET_CD_ORIGIN_COUNTRY VARCHAR(2), 
    FARDET_CD_DEST_CITY   VARCHAR(5), 
    FARDET_CD_DEST_COUNTRY  VARCHAR(2), 
    FARDET_CD_FARE_BASIS  VARCHAR(8), 
    . 
    . 
    . 
) 
STORED AS TEXTFILE 
LOCATION '/user/ubuntu/result/'; 

回答

0

沒有直接的方式實現這一目標,而是說明我已經使用的列數少的解決方案,但你會得到一個想法。你必須在這裏裝載在臨時表中的數據並進行轉換/清理,同時在主表中加載開發定製EDW種解決方案:

樣本數據:

[5,A1] 
[6,A2] 
[7,A3] 

創建臨時表

create external table table_stg(x string,y string) 
ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' 
LINES TERMINATED BY '\n'; 

創建主表

在主表中臨時表
create external table table_main(x int,y VARCHAR(10)) 
ROW FORMAT DELIMITED 
FIELDS TERMINATED BY ',' 
LINES TERMINATED BY '\n'; 

負載數據

LOAD DATA INPATH '/user/cloudera/result.txt' INTO TABLE table_stg; 

hive> select * from table_stg; 
OK 
[5 A1] 
[6 A2] 
[7 A3] 
Time taken: 0.086 seconds, Fetched: 3 row(s) 

負載乾淨數據

insert into table table_main 
select regexp_replace(x, '\\[',''), regexp_replace(y, '\\]','') 
from table_stg; 

最終輸出

hive> select * from table_main; 
OK 
5 A1 
6 A2 
7 A3 
Time taken: 0.155 seconds, Fetched: 3 row(s)