2015-11-11 40 views
0

我是Hadoop編程新手,在豬身上尋找幫助。我的數據來自simple.txt格式,因爲它的分辨率爲,。我有兩個用例。我想在所有列上執行ltrim(rtrim()),並對所選字段轉至UPPERPIG TRIM和UPPER

這裏是我的腳本:

party = Load '/party_test_pig.txt' USING PigStorage(',') AS(....); 
Trim_party = FOREACH Upper_party GENERATE TRIM(*); 
Upper_party = FOREACH party GENERATE UPPER(col1), UPPER(col2), UPPER(col3); 

Upper_party:使其成爲大寫後,我要查看所有列,不僅得到改變爲大寫列。

Trim_party:做了一些研究,發現,修剪所有列,我將不得不寫一個UDF。我可以做Trim_party = FOREACH Upper_party GENERATE TRIM(col1)...TRIM(coln);,但我覺得這不是一種有效的方法和耗時。

有沒有其他的方法,我可以使這個腳本工作,而無需編寫用於修剪的UDF?

在此先感謝。

回答

1

如果您提供一個數據樣本,它會更容易。據我所知,我會這樣做:

-- Load each line as one string with TextLoader 
A = LOAD '/user/guest/Pig/20151112.PigTest.txt' USING TextLoader() AS (line:CHARARRAY); 
-- Apply TRIM and UPPER transformation, it will keep spaces that are inside your strings 
B = FOREACH A GENERATE UPPER(line) AS lineUP; 
-- Split lines with your delimiter 
C = FOREACH B GENERATE FLATTEN(STRSPLIT(lineUP, ',')) AS (col1:CHARARRAY, ... ,coln:CHARARRAY); 
-- Select the columns you need 
D = FOREACH C GENERATE TRIM(col1) AS col1T, ..., TRIM(coln) AS colnT; 
+0

嗨@AntonyBrd謝謝你的回答。上部工作正常。但修剪不起作用。 – LazyBones

+0

我甚至跑過'B = FOREACH A GENERATE TRIM(line)AS lineTRIM;'只是爲了驗證它是否有效,但在這裏也失敗了。 – LazyBones

+0

RECORD 1: '101,2015-11-11,201,hola,Shah,Rukh,Khan,Shahrukh Khan,SRK,Mr,Male,Married,Hindi,2065,1965-11-02,2065-11-02,1992 -11-02,2065-11-02,100' RECORD 2: '102,2015-11-12,202,hi,Kajol,Tanuja,Mukerjee,Kajol Devgan,KD,Mrs,Female,Married,Hindi,2066,196 -11-03,2065-11-03,1992-11-03,2065-11-03,101' – LazyBones