2014-12-28 62 views
0

我正在使用豬拉丁語進行大型XML轉儲。我試圖在拉丁文中獲取像location和temp_c這樣的xml節點的值。該文件就像如何解析XML元素節點懷疑豬腳本?

<?xml version="1.0" encoding="ISO-8859-1"?> 
<?xml-stylesheet href="latest_ob.xsl" type="text/xsl"?> 
<current_observation version="1.0" 
    xmlns:xsd="http://www.w3.org/2001/XMLSchema" 
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
    xsi:noNamespaceSchemaLocation="http://www.weather.gov/view/current_observation.xsd"> 
    <credit>NOAA's National Weather Service</credit> 
    <credit_URL>http://weather.gov/</credit_URL> 
    <image> 
     <url>http://weather.gov/images/xml_logo.gif</url> 
     <title>NOAA's National Weather Service</title> 
     <link>http://weather.gov</link> 
    </image> 
    <suggested_pickup>15 minutes after the hour</suggested_pickup> 
    <suggested_pickup_period>60</suggested_pickup_period> 
    <location>Unknown Station</location> 
    <station_id>51WH0</station_id> 
    <observation_time>Last Updated on Dec 23 2014, 11:00 pm LST</observation_time> 
     <observation_time_rfc822>Tue, 23 Dec 2014 23:00:00 +1000</observation_time_rfc822> 
    <temperature_string>71.4 F (21.9 C)</temperature_string> 
    <temp_f>71.4</temp_f> 
    <temp_c>21.9</temp_c> 
    <water_temp_f>75.9</water_temp_f> 
    <water_temp_c>24.4</water_temp_c> 
    <wind_string>North at 24.6 MPH (21.38 KT)</wind_string> 
    <wind_dir>North</wind_dir> 
    <wind_degrees>20</wind_degrees> 
    <wind_mph>24.6</wind_mph> 
    <wind_gust_mph>0.0</wind_gust_mph> 
    <wind_kt>21.38</wind_kt> 
    <pressure_string>1015.0 mb</pressure_string> 
    <pressure_mb>1015.0</pressure_mb> 
    <dewpoint_string>58.1 F (14.5 C)</dewpoint_string> 
    <dewpoint_f>58.1</dewpoint_f> 
    <dewpoint_c>14.5</dewpoint_c> 
</current_observation> 

回答

1

可能它會幫助你,試試看。

REGISTER piggybank.jar 
DEFINE XPath org.apache.pig.piggybank.evaluation.xml.XPath(); 

A = LOAD 'xmls/your_file.xml' using org.apache.pig.piggybank.storage.XMLLoader('current_observation') as (x:chararray); 

B = FOREACH A GENERATE XPath(x, 'current_observation/location'), XPath(x, 'current_observation/temp_c'); 
dump B; 
+0

你好拉維我試過,但有根元素之後的一些屬性由於它是不能夠傾倒的結果。並且每個xml文件都包含相同的格式。

+0

使用** StreamingXMLLoader **,它會清除你的困惑。 –

0

使用本:

data = LOAD '/path/your_file.xml' 
     USING org.apache.pig.piggybank.storage.StreamingXMLLoader(
      'current_observation', 
      'credit, credit_URL, image, suggested_pickup, suggested_pickup_period, location, station_id, observation_time,temp_f, temp_c, water_temp_f, water_temp_c, wind_string, wind_dir, wind_degrees, wind_mph, wind_gust_mph, wind_kt, pressure_string, pressure_mb, dewpoint_string, dewpoint_f, dewpoint_c' 
     ) AS (
      credit: {(attr:map[], content:chararray)} 
      credit_URL: {(attr:map[], content:chararray)} 

     . 
     . 
     . 
     ); 

dump data; 
+0

獲取這些錯誤: 引起︰org.apache.pig.backend.executionengine.ExecException:錯誤1070:無法解析org.apache.pig.piggybank.storage.StreamingXMLLoader使用導入:[,java.lang。,org。 apache.pig.builtin。,org.apache.pig.impl.builtin。] \t at org.apache.pig.impl.PigContext.resolveClassName(PigContext.java:682) \t at org.apache.pig.parser。 LogicalPlanBuilder.validateFuncSpec(LogicalPlanBuilder.java:1320) \t ... 26更多 2014-12-29 12:26:40,739 [main]錯誤 –

+0

org.apache.pig.tools.grunt.Grunt - 錯誤1070:無法使用imports解析org.apache.pig.piggybank.storage.StreamingXMLLoader:[,java.lang。,org.apache.pig.builtin。,org.apache.pig.impl.builtin。] 日誌文件的詳細信息:/home/hduser/Desktop/pig_1419836199018.log –

+0

我已經註冊piggybank.jar –