2015-03-30 21 views
1

我有一個具有以下格式的TXT文件:更改文本文件的格式與Apache豬

{ (word1),(word2),(word3),....,(wordn) } 

的話不加引號。我想使用Apache的豬和改變這個文件的格式只是爲了:

word1 
word2 
word3 
wordn  

有沒有辦法這樣做與Apache豬?

回答

0

你可以試試嗎?

輸入

{ (word1),(word2),(word3),(wordn) } 

PigScript1:

A = LOAD 'input' AS (mybag:{T:(line:chararray)}); 
B = FOREACH A GENERATE REPLACE(BagToString(mybag.line),'_',' '); 
STORE B INTO 'output'; 

輸出:(存儲在輸出/部件*文件)

word1 word2 word3 wordn 

更新:(櫃面,如果你想在一個行的所有列,然後使用拼合運營商)
PigScript2:

A = LOAD 'input' AS (mybag:{T:(line:chararray)}); 
B = FOREACH A GENERATE FLATTEN(mybag); 
STORE B INTO 'output1'; 

輸出:

word1 
word2 
word3 
wordn