2014-10-08 23 views
0

輸入文件結構:記錄基於時間戳 預期輸入FIEL大小排序爲:2-3TBPIG腳本來處理第n-1記錄

timestamp 
============== 
20141014120523 
20141014120534 
20141014120537 
20141014120542 
20141014120549 
20141014120555 
20141014120565 
20141014120570 
20141014120512 
... 
... 

使用PIG我需要找到時間差在第N條記錄和第N-1條記錄時間戳(20141014120534 - 20141014120523 = 11秒)之間。 我需要遍歷所有記錄,從此前的紀錄得到時間差

示例輸出

0 
11 
3 
5 
... 

請幫我正確的資源/文獻/解決方案。

回答

1

你可以試試嗎?

input.txt 
20141014120523 
20141014120534 
20141014120537 
20141014120542 
20141014120549 
20141014120555 
20141014120565 
20141014120570 

PigScript: 
A = LOAD 'input.txt' using PigStorage() as (time:long); 
B = RANK A; 
C = FILTER B BY rank_A; 
D = FILTER B BY rank_A > 1; 
E = FOREACH D GENERATE ($0-1),$1; 
F = JOIN B BY $0, E BY $0; 
G = FOREACH F GENERATE (E::time - B::time); 
DUMP G; 

Output: 
(11) 
(3) 
(5) 
(7) 
(6) 
(10) 
(5)