2017-03-01 69 views
2

我試圖將HDFS目錄中超過3天的文件移動到HDFS中的歸檔文件夾。在Hortonworks Distribution中歸檔HDFS文件時的AWK使用問題

AWK腳本:

hdfs dfs -ls hdfs://companycluster/data/src/purecloud/current | tail -n+2 | xargs -n 8 | 
awk '{ 
DAY_CONV=(60*60*24); 
X ="date +%s";X | getline ED;printf("") > "X";close("X"); 
Y="date -d \"$6\" +%s";Y | getline SD;printf("") > "Y";close("Y"); 
DIFF=(ED-SD)/DAY_CONV; 
print " SD=",SD" ED=",ED," DIFF=",DIFF," INPUT=",$6; 
if (DIFF -gt 3) 
cmd="hdfs dfs -ls " $8; 
system(cmd); 
}' 

注:

  • 變量X的值是恆定的
  • 值:一旦這個劇本開始工作

    問題CMD變量將有一個mv命令變量Y的值恆定爲

  • 無法獲得日差2日期間倫斯,我得到DIFF
  • 分數值如果在AWK聲明未能由於不準確的參數

輸入給AWK:從AWK(已經調試打印)

-rw-r--r-- 3 user hdfs 50687424 2017-02-27 17:06 hdfs://companycluster/data/src/purecloud/current/Conversation.json.240220170000 
-rw-r--r-- 3 user hdfs 49967359 2017-02-27 17:06 hdfs://companycluster/data/src/purecloud/current/Conversation.json.250220170000 
-rw-r--r-- 3 user hdfs 28647041 2017-02-27 17:00 hdfs://companycluster/data/src/purecloud/current/Conversation.json.260220170000 
-rw-r--r-- 3 user hdfs 6728724 2017-03-01 13:05 hdfs://companycluster/data/src/purecloud/current/conversation.json.2017-03-01_1305 
-rw-r--r-- 3 user hdfs 7050854 2017-03-01 13:25 hdfs://companycluster/data/src/purecloud/current/conversation.json.2017-03-01_1325 
-rw-r--r-- 3 user hdfs 6630106 2017-03-01 13:45 hdfs://companycluster/data/src/purecloud/current/conversation.json.2017-03-01_1345 
-rw-r--r-- 3 user hdfs 6766650 2017-03-01 14:05 hdfs://companycluster/data/src/purecloud/current/conversation.json.2017-03-01_1405 
-rw-r--r-- 3 user hdfs 6486095 2017-03-01 14:25 hdfs://companycluster/data/src/purecloud/current/conversation.json.2017-03-01_1425 
-rw-r--r-- 3 user hdfs 6350705 2017-03-01 14:45 hdfs://companycluster/data/src/purecloud/current/conversation.json.2017-03-01_1445 
-rw-r--r-- 3 user hdfs 6082589 2017-03-01 15:05 hdfs://companycluster/data/src/purecloud/current/conversation.json.2017-03-01_1505 
-rw-r--r-- 3 user hdfs 6417281 2017-03-01 15:25 hdfs://companycluster/data/src/purecloud/current/conversation.json.2017-03-01_1525 
-rw-r--r-- 3 user hdfs 6519949 2017-03-01 15:45 hdfs://companycluster/data/src/purecloud/current/conversation.json.2017-03-01_1545 
-rw-r--r-- 3 user hdfs 6988534 2017-03-01 16:05 hdfs://companycluster/data/src/purecloud/current/conversation.json.2017-03-01_1605 
-rw-r--r-- 3 user hdfs 6734459 2017-03-01 16:25 hdfs://companycluster/data/src/purecloud/current/conversation.json.2017-03-01_1625 
-rw-r--r-- 3 user hdfs 6842766 2017-03-01 16:45 hdfs://companycluster/data/src/purecloud/current/conversation.json.2017-03-01_1645 
-rw-r--r-- 3 user hdfs 6575513 2017-03-01 17:05 hdfs://companycluster/data/src/purecloud/current/conversation.json.2017-03-01_1705 
-rw-r--r-- 3 user hdfs 6574050 2017-03-01 17:25 hdfs://companycluster/data/src/purecloud/current/conversation.json.2017-03-01_1725 
-rw-r--r-- 3 user hdfs 50215096 2017-02-27 18:01 hdfs://companycluster/data/src/purecloud/current/conversation_6hr.json.2017-02-27_1801 
-rw-r--r-- 3 user hdfs 50985760 2017-02-27 18:18 hdfs://companycluster/data/src/purecloud/current/conversation_6hr.json.2017-02-27_1818 
-rw-r--r-- 3 user hdfs 58206776 2017-02-28 00:01 hdfs://companycluster/data/src/purecloud/current/conversation_6hr.json.2017-02-28_0001 
-rw-r--r-- 3 user hdfs 58823497 2017-02-28 06:01 hdfs://companycluster/data/src/purecloud/current/conversation_6hr.json.2017-02-28_0601 
-rw-r--r-- 3 user hdfs 61591660 2017-02-28 12:01 hdfs://companycluster/data/src/purecloud/current/conversation_6hr.json.2017-02-28_1201 
-rw-r--r-- 3 user hdfs 59703667 2017-03-01 10:40 hdfs://companycluster/data/src/purecloud/current/conversation_6hr.json.2017-02-28_1801 
-rw-r--r-- 3 user hdfs 59160075 2017-03-01 10:47 hdfs://companycluster/data/src/purecloud/current/conversation_6hr.json.2017-03-01_0001 
-rw-r--r-- 3 user hdfs 61812121 2017-03-01 10:48 hdfs://companycluster/data/src/purecloud/current/conversation_6hr.json.2017-03-01_0601 
-rw-r--r-- 3 user hdfs 63804772 2017-03-01 12:01 hdfs://companycluster/data/src/purecloud/current/conversation_6hr.json.2017-03-01_1201 

輸出:

SD= 1488286800 ED= 1488348518 DIFF= 0.714329 INPUT= 2017-02-27 
-rw-r--r-- 3 user hdfs 50687424 2017-02-27 17:06 hdfs://companycluster/data/src/purecloud/current/Conversation.json.240220170000 
SD= 1488286800 ED= 1488348518 DIFF= 0.714329 INPUT= 2017-02-27 
-rw-r--r-- 3 user hdfs 49967359 2017-02-27 17:06 hdfs://companycluster/data/src/purecloud/current/Conversation.json.250220170000 
SD= 1488286800 ED= 1488348518 DIFF= 0.714329 INPUT= 2017-02-27 
-rw-r--r-- 3 user hdfs 28647041 2017-02-27 17:00 hdfs://companycluster/data/src/purecloud/current/Conversation.json.260220170000 
SD= 1488286800 ED= 1488348518 DIFF= 0.714329 INPUT= 2017-03-01 
-rw-r--r-- 3 user hdfs 6728724 2017-03-01 13:05 hdfs://companycluster/data/src/purecloud/current/conversation.json.2017-03-01_1305 
SD= 1488286800 ED= 1488348518 DIFF= 0.714329 INPUT= 2017-03-01 
-rw-r--r-- 3 user hdfs 7050854 2017-03-01 13:25 hdfs://companycluster/data/src/purecloud/current/conversation.json.2017-03-01_1325 
SD= 1488286800 ED= 1488348518 DIFF= 0.714329 INPUT= 2017-03-01 
-rw-r--r-- 3 user hdfs 6630106 2017-03-01 13:45 hdfs://companycluster/data/src/purecloud/current/conversation.json.2017-03-01_1345 
SD= 1488286800 ED= 1488348518 DIFF= 0.714329 INPUT= 2017-03-01 
-rw-r--r-- 3 user hdfs 6766650 2017-03-01 14:05 hdfs://companycluster/data/src/purecloud/current/conversation.json.2017-03-01_1405 
SD= 1488286800 ED= 1488348518 DIFF= 0.714329 INPUT= 2017-03-01 
-rw-r--r-- 3 user hdfs 6486095 2017-03-01 14:25 hdfs://companycluster/data/src/purecloud/current/conversation.json.2017-03-01_1425 
SD= 1488286800 ED= 1488348518 DIFF= 0.714329 INPUT= 2017-03-01 
-rw-r--r-- 3 user hdfs 6350705 2017-03-01 14:45 hdfs://companycluster/data/src/purecloud/current/conversation.json.2017-03-01_1445 
SD= 1488286800 ED= 1488348518 DIFF= 0.714329 INPUT= 2017-03-01 
-rw-r--r-- 3 user hdfs 6082589 2017-03-01 15:05 hdfs://companycluster/data/src/purecloud/current/conversation.json.2017-03-01_1505 
SD= 1488286800 ED= 1488348518 DIFF= 0.714329 INPUT= 2017-03-01 
-rw-r--r-- 3 user hdfs 6417281 2017-03-01 15:25 hdfs://companycluster/data/src/purecloud/current/conversation.json.2017-03-01_1525 
SD= 1488286800 ED= 1488348518 DIFF= 0.714329 INPUT= 2017-03-01 
-rw-r--r-- 3 user hdfs 6519949 2017-03-01 15:45 hdfs://companycluster/data/src/purecloud/current/conversation.json.2017-03-01_1545 
SD= 1488286800 ED= 1488348518 DIFF= 0.714329 INPUT= 2017-03-01 
-rw-r--r-- 3 user hdfs 6988534 2017-03-01 16:05 hdfs://companycluster/data/src/purecloud/current/conversation.json.2017-03-01_1605 
SD= 1488286800 ED= 1488348518 DIFF= 0.714329 INPUT= 2017-03-01 
-rw-r--r-- 3 user hdfs 6734459 2017-03-01 16:25 hdfs://companycluster/data/src/purecloud/current/conversation.json.2017-03-01_1625 
SD= 1488286800 ED= 1488348518 DIFF= 0.714329 INPUT= 2017-03-01 
-rw-r--r-- 3 user hdfs 6842766 2017-03-01 16:45 hdfs://companycluster/data/src/purecloud/current/conversation.json.2017-03-01_1645 
SD= 1488286800 ED= 1488348518 DIFF= 0.714329 INPUT= 2017-03-01 
-rw-r--r-- 3 user hdfs 6575513 2017-03-01 17:05 hdfs://companycluster/data/src/purecloud/current/conversation.json.2017-03-01_1705 
SD= 1488286800 ED= 1488348518 DIFF= 0.714329 INPUT= 2017-02-27 
-rw-r--r-- 3 user hdfs 50215096 2017-02-27 18:01 hdfs://companycluster/data/src/purecloud/current/conversation_6hr.json.2017-02-27_1801 
SD= 1488286800 ED= 1488348518 DIFF= 0.714329 INPUT= 2017-02-27 
-rw-r--r-- 3 user hdfs 50985760 2017-02-27 18:18 hdfs://companycluster/data/src/purecloud/current/conversation_6hr.json.2017-02-27_1818 
SD= 1488286800 ED= 1488348518 DIFF= 0.714329 INPUT= 2017-02-28 
-rw-r--r-- 3 user hdfs 58206776 2017-02-28 00:01 hdfs://companycluster/data/src/purecloud/current/conversation_6hr.json.2017-02-28_0001 
SD= 1488286800 ED= 1488348518 DIFF= 0.714329 INPUT= 2017-02-28 
-rw-r--r-- 3 user hdfs 58823497 2017-02-28 06:01 hdfs://companycluster/data/src/purecloud/current/conversation_6hr.json.2017-02-28_0601 
SD= 1488286800 ED= 1488348518 DIFF= 0.714329 INPUT= 2017-02-28 
-rw-r--r-- 3 user hdfs 61591660 2017-02-28 12:01 hdfs://companycluster/data/src/purecloud/current/conversation_6hr.json.2017-02-28_1201 
SD= 1488286800 ED= 1488348518 DIFF= 0.714329 INPUT= 2017-03-01 
-rw-r--r-- 3 user hdfs 59703667 2017-03-01 10:40 hdfs://companycluster/data/src/purecloud/current/conversation_6hr.json.2017-02-28_1801 
SD= 1488286800 ED= 1488348518 DIFF= 0.714329 INPUT= 2017-03-01 
-rw-r--r-- 3 user hdfs 59160075 2017-03-01 10:47 hdfs://companycluster/data/src/purecloud/current/conversation_6hr.json.2017-03-01_0001 
SD= 1488286800 ED= 1488348518 DIFF= 0.714329 INPUT= 2017-03-01 
-rw-r--r-- 3 user hdfs 61812121 2017-03-01 10:48 hdfs://companycluster/data/src/purecloud/current/conversation_6hr.json.2017-03-01_0601 
SD= 1488286800 ED= 1488348518 DIFF= 0.714329 INPUT= 2017-03-01 
-rw-r--r-- 3 user hdfs 63804772 2017-03-01 12:01 hdfs://companycluster/data/src/purecloud/current/conversation_6hr.json.2017-03-01_1201 

分佈的信息。

  • Hortonworks
  • 的Hadoop 2.7.1.2.4.0.0-169
  • Linux的DH01 aaaaaaaaaaaaa.x86_64#1 SMP太陽年07月27 15點55分46秒EDT 2014 x86_64的x86_64的x86_64的GNU/Linux的

任何輸入都將非常有用。

+2

'-gt'是一個bash運算符。使用'awk'你應該使用'>' – oliv

+0

@ Kfactor21 - 你可以粘貼你的預期輸出。 –

+0

@VIPINKUMAR ..對於我發佈的輸入應該沒有任何輸出。我輸入的所有文件都少於3天。 因此,「IF」語句中的'hdfs dfs -ls'命令不應執行。但現在是因爲DIFF變量沒有正確的日期差異值。 – Kfactor21

回答

1
hdfs dfs -ls hdfs://companycluster/data/src/purecloud/current | tail -n+2 | xargs -n 8 \ 
| awk ' 
     BEGIN { 
     # take the time reference (3 days before now) 
     R = systime() - 3 * 86400 
     } 
     # for each line 
     { 
     # format used by mktime "YYYY MM DD HH MM SS [DST]" 
     # create the time in mktime format 
     t = $6 " " $7 " 00";gsub(/[-:]/, " ", t) 
     # convert in epoch 
     T = mktime(t) 
     # if lower than reference time 
     if(T < R) { 
     print "Included line: " $0 

     # do what you want as action 
     cmd = "hdfs dfs -ls " $8 
     system(cmd) 
     } 
     else { 
     print "Discarted line: $0" 
     } 
     }' 

評論:

  • 自我評價AWK
  • 輸入到AWK當然可以進行優化(AWK做尾巴非常好,xargs的肯定不是強制性這裏[無HDFS從這裏測試] )
+0

感謝您的幫助。該腳本執行所需的歸檔。 – Kfactor21