1
我試圖將嵌套的XML數據加載到Hive中。樣本數據如下...使用SerDe將嵌套的XML數據加載到Hive中
<CustomerOrders>
<Customers>
<CustID>ALFKI</CustID>
<Orders>
<OrderID>10643</OrderID>
<CustomerID>ALFKI</CustomerID>
<OrderDate>1997-08-25</OrderDate>
</Orders>
<Orders>
<OrderID>10692</OrderID>
<CustomerID>ALFKI</CustomerID>
<OrderDate>1997-10-03</OrderDate>
</Orders>
<CompanyName>Alfreds Futterkiste</CompanyName>
</Customers>
<Customers>
<CustID>ANATR</CustID>
<Orders>
<OrderID>10308</OrderID>
<CustomerID>ANATR</CustomerID>
<OrderDate>1996-09-18</OrderDate>
</Orders>
<CompanyName>Ana Trujillo Emparedados y helados</CompanyName>
</Customers>
</CustomerOrders>
下面是我使用的命令:
CREATE TABLE CUSTOMERORDERS(
CustID STRING,
Orders ARRAY<STRUCT<OrderID:STRING,CustomerID:STRING,OrderDate:STRING>>,
CompanyName STRING)
ROW FORMAT SERDE 'com.ibm.spss.hive.serde2.xml.XmlSerDe'
WITH SERDEPROPERTIES (
"column.xpath.CustID"="/Customers/CustID/text()",
"column.xpath.Orders"="/Customers/Orders",
"column.xpath.OrderID"="/Customers/Orders/OrderID",
"column.xpath.CustomerID"="/Customers/Orders/CustomerID",
"column.xpath.OrderDate"="/Customers/Orders/OrderDate",
"column.xpath.CompanyName"="/Customers/CompanyName/text()")
STORED AS INPUTFORMAT 'com.ibm.spss.hive.serde2.xml.XmlInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat'
TBLPROPERTIES ("xmlinput.start"="<Customers>","xmlinput.end"= "</Customers>");
輸出,我gettings是:
hive> select * from customerorders;
OK
ALFKI [{"orderid":null,"customerid":null,"orderdate":null},{"orderid":null,"customerid":null,"orderdate":null}] Alfreds Futterkiste
ANATR [{"orderid":null,"customerid":null,"orderdate":null}] Ana Trujillo Emparedados y helados
Time taken: 0.039 seconds, Fetched: 2 row(s)
我越來越null
值爲OrderID
,CustomerID
和OrderDate
。任何人都可以幫助我解決這個問題嗎?
感謝
我想我不應該配置'OrderID','CustomerID','OrderDate'在'SERDEPROPERTIES' ,因爲它們不是表格列。所以,我刪除了它們。我爲'訂單'嘗試了'/ text()'。在這種情況下,我得到'NULL'。 'hive> select * from customerorders;採取 OK ALFKI NULL艾爾弗雷德Futterkiste ANATR NULL安娜特魯希略EmparedadosŸhelados 時間:0.037秒,抓取時間:2行(S)' –