2013-08-20 78 views
0

處理我的XML文件是這樣的:XML文件使用Apache豬

<CATALOG> 
<CD> 
<TITLE>hadoop developer</TITLE> 
<ARTIST>ajay</ARTIST> 
<COUNTRY>india</COUNTRY> 
<COMPANY>ITC</COMPANY> 
<PRICE>10.90</PRICE> 
<YEAR>2013</YEAR> 
</CD> 
</CATALOG> 

和我使用了一些正則表達式,但我不知道爲什麼我沒有得到期望的輸出...我的代碼如下:
**註冊/usr/lib/pig/piggybank.jar

A = load 'input.xml' using org.apache.pig.piggybank.storage.XMLLoader('CATALOG') as (x: chararray); 
B = foreach A GENERATE FLATTEN(REGEX_EXTRACT_ALL(x,'<CATALOG>\n*<CD>\n<TITLE>(.*)</TITLE>\n*<ARTIST>(.*)</ARTIST>\n*<COUNTRY>(.*)</COUNTRY>\n*<COMPANY>(.*)</COMPANY>\n*<PRICE>(.*)</PRICE>\n*<YEAR>(.*)</YEAR>\n*</CD>\\n*</CATALOG>')) as (name:chararray, words:chararray);** 

而且我的輸出如下:

2013-08-20 12:40:24,043 [main] INFO 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Success! 

2013-08-20 12:40:24,044 [main] WARN 
org.apache.pig.data.SchemaTupleBackend - SchemaTupleBackend has already been initialized 

2013-08-20 12:40:24,047 [main] INFO 
org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1 

2013-08-20 12:40:24,047 [main] INFO 
org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1 

它有什麼問題?謝謝。

+0

@ mr2ert感謝您的編輯,但請幫助我,我有什麼問題。 –

+0

我剛剛運行腳本,一切正常。你將不得不更加具體地瞭解什麼是錯誤的。 – mr2ert

回答

0

這個怎麼樣:

A = load 'input.xml' using org.apache.pig.piggybank.storage.XMLLoader('CD') 
    as (x:chararray); 

B = foreach A GENERATE FLATTEN(REGEX_EXTRACT_ALL(x, 
     '<CD>\\n\\s*<TITLE>(.*)</TITLE>\\n\\s*<ARTIST>(.*)</ARTIST>\\n\\s*<COUNTRY>(.*)</COUNTRY>\\n\\s*<COMPANY>(.*)</COMPANY>\\n\\s*<PRICE>(.*)</PRICE>\\n\\s*<YEAR>(.*)</YEAR>\\n\\s*</CD>')) 
    as (title:chararray, artist:chararray, country:chararray, company:chararray, price:double, year:int); 
+0

預期輸出:Hadoop的開發|阿賈伊|印度| ITC | 10.90 | 2013,但我讓我喜歡() () () () () () () () () () () () () () () () () () () () () () () () () () () () () () () () () () () () () () –

+0

,因爲我測試了它的奇怪。你提供了與輸入相同的XML嗎? –

+0

當我在本地運行你的腳本沒有問題,但在MR模式的情況下,再次面臨相同的問題 –

1

嘗試,這是測試和工程正確的;

/user/hue/創建XML文件夾,該文件夾XMLcopy catalog.xml (your code)

REGISTER piggybank.jar ; 

xmldata = LOAD 'XML/catalog.xml' USING org.apache.pig.piggybank.storage.XMLLoader('CD') as(doc:chararray); 

data = FOREACH xmldata GENERATE FLATTEN(REGEX_EXTRACT_ALL(doc,'<CD>\\s*<TITLE>(.*)</TITLE>\\s*<AUTHOR>(.*)</AUTHOR>\\s*<COUNTRY>(.*)</COUNTRY>\\s*<COMPANY>(.*)</COMPANY>\\s*<PRICE>(.*)</PRICE>\\s*<YEAR>(.*)</YEAR>\\s*</CD>')) AS (title:chararray, author:chararray, country:chararray, company:chararray, price:chararray, year:chararray); 

DESCRIBE data; 

dump data; 
0

在這應該工作。

A = LOAD 'xml-files/cd.xml' using org.apache.pig.piggybank.storage.XMLLoader('CD') as (x:chararray); 

B = foreach A GENERATE FLATTEN(REGEX_EXTRACT_ALL(x,'<CD>\\s*<TITLE>(.*)</TITLE>\\s*<ARTIST>(.*)</ARTIST>\\s*<COUNTRY>(.*)</COUNTRY>\\s*<COMPANY>(.*)</COMPANY>\\s*<PRICE>(.*)</PRICE>\\s*<YEAR>(.*)</YEAR>\\s*</CD>'));