使用豬獲取來自鍵值對的n個值

我有一個測試文件，其中鍵和值由昏迷分隔。我怎樣才能得到每個關鍵使用豬腳本只有10個值。使用豬獲取來自鍵值對的n個值

樣本輸入：john | str1，str2，str3，str4，str5，str6，str7，str8，str9，str10，str11，str2 ，首選輸出：john | str1，str2，str3，str4，str5，str6 ，str7，str8，str9，str10

來源

2013-07-05 Nagaraj Vittal

請編輯您的問題與示例輸入和首選輸出。並告訴我們你已經嘗試了什麼。 –

請將附加信息與問題一起添加，而不是作爲評論。 – Tariq

有很多不同的方法來做到這一點，具體取決於你作爲輸入和需求作爲輸出。我假設你只想要前十個，剩下的值可以被拋出。

這是我會這樣做的方式（CL）。這是比短的方式（CF）長一點，但代碼對我來說更清晰，並允許更靈活的命名方案：

A = LOAD 'myData' USING PigStorage('|') AS (name: chararray, vals: chararray) ; 
B = FOREACH A GENERATE name, STRSPLIT(vals, ',') AS svals:() ; 
CL = FOREACH B GENERATE name, 
         svals.($0, $1, $2, $3, $4, $5, $6, $7, $8, $9) AS ten ; 
         -- ten can have a schema, like ten: (a1: chararray, etc.) 
         -- After giving it a schema, you can also flatten it to 
         -- make it like the output of CF, but with better types

這是CL生成的架構和輸出：

CL: {name: chararray,ten:()} 
(john,(str1,str2,str3,str4,str5,str6,str7,str8,str9,str10))

這種方式是有點短，但使得它更難的模式應用到值：

-- Uses the same A 
B = FOREACH A GENERATE name AS name, FLATTEN(STRSPLIT(vals, ',')) ; 
CF = FOREACH B GENERATE $0 AS name: chararray, $1, $2 .. $10 ;

模式和輸出CF：

CF: {name: chararray,bytearray,bytearray,bytearray,bytearray,bytearray,bytearray,bytearray,bytearray,bytearray,bytearray} 
(john,str1,str2,str3,str4,str5,str6,str7,str8,str9,str10)

來源

2013-07-05 17:50:39 mr2ert

感謝mr2ert的幫助。這個腳本完全解決了我的問題。 –

使用豬獲取來自鍵值對的n個值

回答

相關問題