2016-03-15 74 views
0

我有一個樣本豬腳本,其數據將讀取一個csv文件並將其轉儲到屏幕上;但是,我的數據具有名稱值對。我如何讀取一行名稱值對並使用字段的名稱和值的值來拆分對?Apache Pig在數據文件中讀取名稱值對

數據:

1,Smith,Bob,Business Development 
2,Doe,John,Developer 
3,Jane,Sally,Tester 

腳本:

data = LOAD 'example-data.txt' USING PigStorage(',') 
      AS (id:chararray, last_name:chararray, 
      first_name:chararray, role:chararray); 
DESCRIBE data; 
DUMP data; 

輸出:

data: {id: chararray,last_name: chararray,first_name: chararray,role: chararray} 
(1,Smith,Bob,Business Development) 
(2,Doe,John,Developer) 
(3,Jane,Sally,Tester) 
然而

,給出下面的輸入(如名稱值對);我如何處理數據以獲得相同的「數據對象」?

id=1,last_name=Smith,first_name=Bob,role=Business Development 
id=2,last_name=Doe,first_name=John,role=Developer 
id=3,last_name=Jane,first_name=Sally,role=Tester 

回答

0

參見STRSPLIT

A = LOAD 'example-data.txt' USING PigStorage(',') AS (f1:chararray,f2:chararray,f3:chararray, f4:chararray); 
B = FOREACH A GENERATE 
       FLATTEN(STRSPLIT(f1,'=',2)) as (n1:chararray,v1:chararray), 
       FLATTEN(STRSPLIT(f2,'=',2)) as (n2:chararray,v2:chararray), 
       FLATTEN(STRSPLIT(f3,'=',2)) as (n3:chararray,v3:chararray), 
       FLATTEN(STRSPLIT(f4,'=',2)) as (n4:chararray,v4:chararray); 
C = FOREACH B GENERATE v1,v2,v3,v4; 
DUMP C; 
+0

具有名稱值對的一點是,順序並不重要;我可以將最終的GENERATE設置爲v1 AS VALUE_OF(n1),...這就是爲什麼每個變量名都被保存並與值相關聯 –

相關問題