2017-06-09 158 views
1

我想將xml文件加載到配置單元表中。我正在使用xml serde here。我能夠加載簡單的平面XML文件。但是當xml中有嵌套元素時,我使用配置單元複雜數據類型來存儲它們(例如,array<struct>)。以下是我正在嘗試加載的示例xml。我的目標是將所有元素,屬性和內容加載到配置單元表中。XML架構到Hive架構

,我試圖讓
<description action="up"> 
    <name action="aorup" ln="te"> 
    this is name1 
    </name> 
    <name action="aorup" ln="tm"> 
    this is name2 
    </name> 
    <name action="aorup" ln="hi"> 
    this is name2 
    </name> 
</description> 

蜂巢輸出...

{action:"up", name:[{action:"aorup", ln:"te", content:"this is name1"}, {action:"aorup", ln:"tm", content:"this is name2"}, {action:"aorup", ln:"hi", content:"this is name3"}]} 

我想這整個XML加載到一個單一的蜂房列。我試過以下內容:

CREATE TABLE description(
description STRUCT< 
Action:STRING, 
name:ARRAY<STRUCT< 
    Action:STRING, ln:STRING, content:STRING 
    >> 
>) 
ROW FORMAT SERDE 'com.ibm.spss.hive.serde2.xml.XmlSerDe' 
WITH SERDEPROPERTIES (
"xml.processor.class"="com.ximpleware.hive.serde2.xml.vtd.XmlProcessor", 
"column.xpath.description"="/description") 
STORED AS INPUTFORMAT 'com.ibm.spss.hive.serde2.xml.XmlInputFormat' 
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat' 
TBLPROPERTIES ("xmlinput.start"="<description ","xmlinput.end"= "</description>"); 

但是我得到空值爲Label字段。有人能幫我嗎?

感謝

回答

1
create external table description 
(
    description struct<action:string,description:array<struct<action:string,ln:string,name:string>>> 
) 
row format serde 'com.ibm.spss.hive.serde2.xml.XmlSerDe' 
with serdeproperties 
(
    "column.xpath.description" = "/description" 
) 
stored as 
inputformat  'com.ibm.spss.hive.serde2.xml.XmlInputFormat' 
outputformat 'org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat' 
tblproperties 
(
    "xmlinput.start" = "<description " 
    ,"xmlinput.end" = "</description>" 
) 
; 

select * from description 
; 

+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ 
|                       description                       | 
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ 
| {"action":"up","description":[{"action":"aorup","ln":"te","name":"this is name1"},{"action":"aorup","ln":"tm","name":"this is name2"},{"action":"aorup","ln":"hi","name":"this is name2"}]} | 
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ 
+0

非常感謝您的回答。你能解釋一下'create table'語句嗎?這讓我很困惑。我試着在這個問題上的另一個嵌套的XML模式[鏈接]的解決方案(https://stackoverflow.com/questions/44494364/complex-xml-schema-to-hive-schema)。但無法獲得解決方案。 。你能解釋我出錯的地方嗎? –