1
我有一個存儲在配置單元日誌表中的XML blob(如下所示)。HiveQL和XPath - 如何提取值並替換某些字符
<user>
<uid>1424324325</uid>
<attribs>
<field>
...
</field>
<field>
<name>first</name>
<value>Joh,n</value>
</field>
<field>
...
</field>
<field>
<name>last</name>
<value>D,oe</value>
</field>
<field>
...
</field>
</attribs>
</user>
在蜂巢表的每一行將會有關於不同用戶的信息,我想提取UID,名字和姓氏(從名稱中刪除任何逗號)的值。
1424324325 John Doe
1424435463 Jane Smith
我能夠從XML中提取值。
SELECT uid, fn, ln
FROM log_table
LATERAL VIEW explode(xpath(logs['users_updates'], '/user/uid/text()')) uids as uid
LATERAL VIEW explode(xpath(logs['users_updates'], '/user/attribs/field[name = "first_name"]/value/text()')) fns as fn
LATERAL VIEW explode(xpath(logs['users_updates'], '/user/attribs/field[name = "last_name"]/value/text()')) lns as ln;
但是,我得到難倒試圖從名字&姓氏中刪除不必要的逗號(如果存在的話)。
當我嘗試使用下面顯示的任何方法提取名字時,結果爲空。
LATERAL VIEW explode(xpath(logs['users_updates'], '/users/attribs/field[name = "first_name"]/value/replace(text(),",","")')) fns as fn
LATERAL VIEW explode(xpath(logs['users_updates'], '/users/attribs/field[name = "first_name"]/value/translate(text(),",","")')) fns as fn
當我嘗試它如下所示,替換抱怨關於無效函數,而翻譯拉動數據而不刪除額外的逗號。
LATERAL VIEW explode(xpath(logs['users_updates'], replace('/subscriberUpdates/updates/field[name = "first_name"]/value/text()',",",""))) fns as fn
LATERAL VIEW explode(xpath(logs['users_updates'], translate('/subscriberUpdates/updates/field[name = "first_name"]/value/text()',",",""))) fns as fn
如何在名稱值中提取沒有逗號的信息?
1424324325 John Doe
1424435463 Jane Smith
最終解決方案: 這裏是延的建議
SELECT uid, regexp_replace(fn,","," ") as fname, regexp_replace(ln,","," ") as lname
FROM log_table
LATERAL VIEW explode(xpath(logs['users_updates'], '/user/uid/text()')) uids as uid
LATERAL VIEW explode(xpath(logs['users_updates'], '/user/attribs/field[name = "first_name"]/value/text()')) fns as fn
LATERAL VIEW explode(xpath(logs['users_updates'], '/user/attribs/field[name = "last_name"]/value/text()')) lns as ln;
非常感謝您的信息。我不知道這些限制。 正如你所建議的,我可以通過在Hive中使用regexp_replace來實現它。 – rev