2017-03-01 55 views
2

我試圖將XML文件加載到我的配置單元表中。以下是我的配置單表查詢。將XML數據加載到配置單元表中時出錯

CREATE TABLE MYDATA(NAME STRING, AGE INT, SEX STRING) 
    ROW FORMAT SERDE 'com.ibm.spss.hive.serde2.xml.XmlSerDe' 
    WITH SERDEPROPERTIES(
    "column.xpath.NAME"="/TAG/NAME/text()", 
    "column.xpath.AGE"="/TAG/AGE/int()", 
    "column.xpath.SEX"="/TAG/SEX/text()") 
    STORED AS INPUTFORMAT 'com.ibm.spss.hive.serde2.xml.XmlInputFormat' 
    OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat' 
    LOCATION '/home/sid/hivexmltab' 
    TBLPROPERTIES("xmlinput.start"="<TAG","xmlinput.end"="</TAG>"); 

我的輸入文件是在下面的格式:

<TAG> 
<NAME>ABCD</NAME><AGE>25</AGE><SEX>male</SEX> 
<NAME>EFGH</NAME><AGE>23</AGE><SEX>female</SEX> 
</TAG> 

我想看到的輸出象下面這樣:

ABCD,25,male 
EFGH,23,female 

但是我得到的輸出象下面這樣:

<string>ABCDEFGH</string> NULL <string>malefemale</string> 

我使用jar文件:hivex mlserde-1.0.5.3.jar for Xml SerDe

誰能告訴我什麼是我在這裏做的錯誤? 任何幫助表示讚賞。

回答

1

這是一個糟糕的XML結構...
<NAME>...</NAME><AGE>...</AGE><SEX>...</SEX>的任何組合應該被一個額外的標籤包裝。


CREATE EXTERNAL TABLE MYDATA 
(
    NAME array<string> 
    ,AGE  array<int> 
    ,SEX  array<string>  
) 
    ROW FORMAT SERDE 'com.ibm.spss.hive.serde2.xml.XmlSerDe' 
    WITH SERDEPROPERTIES 
    (
     "column.xpath.NAME" = "TAG/NAME/text()" 
     ,"column.xpath.AGE" = "TAG/AGE/text()" 
     ,"column.xpath.SEX" = "TAG/SEX/text()" 
    ) 
    STORED AS 
    INPUTFORMAT  'com.ibm.spss.hive.serde2.xml.XmlInputFormat' 
    OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat' 
    LOCATION  '/home/sid/hivexmltab' 
    TBLPROPERTIES 
    (
     "xmlinput.start" = "<TAG" 
     ,"xmlinput.end" = "</TAG>" 
    ) 
; 

select * from MYDATA 
; 

+-----------------+------------+-------------------+ 
|  a.name  | mydata.age | mydata.sex  | 
+-----------------+------------+-------------------+ 
| ["ABCD","EFGH"] | [25,23] | ["male","female"] | 
+-----------------+------------+-------------------+ 

select NAME[pe.n] as name 
     ,AGE [pe.n] as age 
     ,SEX [pe.n] as sex 

from MYDATA m 
     lateral view posexplode (m.NAME) pe as n,x 
; 

+------+-----+--------+ 
| name | age | sex | 
+------+-----+--------+ 
| ABCD | 25 | male | 
| EFGH | 23 | female | 
+------+-----+--------+ 
+0

其工作。真正幫助我們構建適合加載xml文件的表結構。 – Sidhartha

1

使用文本()無處不在,修改年齡部位爲:

"column.xpath.AGE"="/TAG/AGE/text()" 

可以在蜂巢表

後來改變數據類型中取出的位置部分從CREATE TABLE:

LOCATION '/home/sid/hivexmltab' 

和而是使用LOAD命令在創建表格後加載所有數據

load data local inpath '/home/sid/hivexmltab/XMLfile.xml' overwrite into table MYDATA; 
相關問題