修改文本文件中不是唯一行的字段

-1

我正在處理一個腳本，該腳本需要從單個文本文件中獲取重複的行並更改日期字段的值，但僅更改日期字段的值。該字段分隔符是TAB所以...修改文本文件中不是唯一行的字段

# cat enviando4 
1414743351  2014-11-01 09:00:00 
1414743351  2014-10-31 09:15:51 
1414743351  2014-10-30 23:00:00 
1414743351  2014-10-31 09:15:51 
1414743351  2014-10-30 23:00:00 
1414743351  2014-10-31 10:25:00 
1414743351  2014-10-31 09:15:51 
1414743351  2014-11-01 10:25:00

我按日期排序行：

/斌/排序enviando4 -k2 -t $ '\ T' -o enviando4

# cat enviando4 
1414743351  2014-10-30 23:00:00 
1414743351  2014-10-30 23:00:00 
1414743351  2014-10-31 09:15:51 
1414743351  2014-10-31 09:15:51 
1414743351  2014-10-31 09:15:51 
1414743351  2014-10-31 10:25:00 
1414743351  2014-11-01 09:00:00 
1414743351  2014-11-01 10:25:00

現在我需要添加至少4分鐘（從不減去）至少有一個重複的日期，所以我將只有唯一的日期。它看起來像這樣：

# cat enviando4 
1414743351  2014-10-30 23:04:00 --> add 4 
1414743351  2014-10-30 23:00:00 --> no change 
1414743351  2014-10-31 09:19:51 --> add 4 
1414743351  2014-10-31 09:23:51 --> add 8 
1414743351  2014-10-31 09:15:51 --> no change 
1414743351  2014-10-31 10:25:00 --> unique, no change 
1414743351  2014-11-01 09:00:00 --> unique, no change 
1414743351  2014-11-01 10:25:00 --> unique, no change

並驗證這些更改沒有使新的重複值。我被困在這。謝謝。

來源

2014-10-31 Fer Nando

如果顛倒排序順序，那麼你將需要4分鐘添加到*最後*重複行 - 更容易在類似AWK做的事情 - 保持一條線在緩衝區中，檢查密鑰的更改，敲入持有的線路，... – Dinesh 2014-10-31 00:25:12

你的任務並不困難。 Bash有奇妙的日期操作實用程序。你需要做的是sort the original列表，然後read each line的排序文件，compare the date/time to the previous日期時間和使用計數器，增加一個counter * 4min偏移量和write the new date/time to your output file.處理時間調整有很多方法。最簡單的方法是將日期/時間字符串轉換爲自紀元以來的秒數。然後只需將偏移量添加到重複時間並將其轉換回所需的日期/時間格式即可。

以下示例顯示了執行此操作的一種方法。有幾個可以組合的操作，但我已經將偏移量計算分開，以使其更具可讀性。該腳本將輸入文件作爲第一個參數（我已將其設置爲默認爲dat/env4.dat，用於我的測試，按照您的設置進行設置）。然後，該腳本將排序到臨時文件，讀取臨時文件，使時間調整爲重複，然後將輸出寫入inputfile.out，在退出之前刪除臨時文件。讓我知道如果您有任何疑問：

#!/bin/bash 

ifn="${1:-dat/env4.dat}"   # set input filename (ifn) and validate 

[ -r "$ifn" ] || { 
    printf "\n Error: input file not readable. Usage: %s [<filename> (dat/env4.dat)]\n\n" "${0//*\//}" >&2 
    exit 1 
} 

## initialize variables 
tfn="/tmp/${ifn//*\//}.tmp"   # set temp filename (tfn) 
ofn="${ifn}.out"     # set output filename (ofn) 
:> "$ofn"       # truncate output file 
pdate=0        # initialize prior date 
cnt=0        # counter variable 
tos=240        # time offset in seconds (4 min.) 
tse=0        # time since epoch in seconds 

sort "$ifn" > "$tfn"    # sort input file into temp file & validate 

[ -r "$tfn" ] || { 
    printf "\n Error: sort failed to produce a tmp file or tmp file not readable\n\n" >&2 
    exit 1 
} 

## read temp file into index/idate and add 4 min to each successive duplicate 
while read -r index idate || [ -n "$idate" ]; do 

    if [ "$pdate" = "$idate" ]; then 
     tse=$(date -d "$idate" +%s) # get time since epoch for idate 
     cnt=$((cnt+1))    # increase counter 
     nos=$((cnt*tos))   # set new time offset (not Nitrous Oxide) 
     ntm=$((tse+nos))   # set new time including offset 
     # write new time to output 
     printf "%s\t%s\n" "$index" "$(date -d "@${ntm}" +"%F %T")" >> "$ofn" 
    else 
     cnt=0; nos=0    # reset counter and new time offset 
     # write output unchanged 
     printf "%s\t%s\n" "$index" "$idate" >> "$ofn" 
    fi 

    pdate="$idate"     # save current date/time as prior date/time 

done <"$tfn" 

[ -r "$tfn" ] && rm "$tfn"   # remove temp file

輸入文件：

$ cat dat/env4.dat 
1414743351  2014-11-01 09:00:00 
1414743351  2014-10-31 09:15:51 
1414743351  2014-10-30 23:00:00 
1414743351  2014-10-31 09:15:51 
1414743351  2014-10-30 23:00:00 
1414743351  2014-10-31 10:25:00 
1414743351  2014-10-31 09:15:51 
1414743351  2014-11-01 10:25:00

輸出文件：

$ cat dat/env4.dat.out 
1414743351  2014-10-30 23:00:00 
1414743351  2014-10-30 23:04:00 
1414743351  2014-10-31 09:15:51 
1414743351  2014-10-31 09:19:51 
1414743351  2014-10-31 09:23:51 
1414743351  2014-10-31 10:25:00 
1414743351  2014-11-01 09:00:00 
1414743351  2014-11-01 10:25:00

注：如果您想翻轉重複，以便更大的抵消時間首先出現，你應該可以操作e在輸出文件上。在offset while loop內執行此操作會使此問題的邏輯過於複雜。如果要在offset while loop中包含附加代碼，基本方法是將先前日期和任何匹配日期存儲在數組中，然後對陣列日期/時間值進行偏移並將其按相反順序寫出。每遇到新的日期/時間，請取消設置陣列。

附錄包括電子郵件和調整現場

如果您有興趣增加在輸出到包括e-mail開頭，然後包括在time adjustment的date portion和time portion之間在new date field，你可以通過簡單地在開始添加電子郵件，然後通過分割返回date新串入date part和time part並在輸出插入兩者之間00:0n:00這樣做比較容易。無論您使用printf還是echo都沒有區別。 printf更加靈活，但有些時候echo也提供了優勢。

注意：在下面的代碼，我形成00:0n:000（與n是4, 8, etc..假定只有2重複如果有3個或更多，你將不得不處理該調整邏輯以形成00:nn:00如果調整。時間比8 minutes（例如12, 16, 20, ...爲3rd, 4th, 5th, ...一式兩份）。更大的

讓我知道，如果你有進一步的問題。

## beginning part of script unchanged 
# tse=0        # time since epoch in seconds 
email="[email protected]"    # email to output 
adjtm=4        # simple value to provide adjustment in 00:04:00, etc. 

sort "$ifn" > "$tfn"    # sort input file into temp file & validate 

[ -r "$tfn" ] || { 
    printf "\n Error: sort failed to produce a tmp file or tmp file not readable\n\n" >&2 
    exit 1 
} 

## read temp file into index/idate and add 4 min to each successive duplicate 
while read -r index idate || [ -n "$idate" ]; do 

    if [ "$pdate" = "$idate" ]; then 
     tse=$(date -d "$idate" +%s) # get time since epoch for idate 
     cnt=$((cnt+1))    # increase counter 
     adj=$((cnt*adjtm))   # compute 4, 8, ... for 00:0n:00 output 
     nos=$((cnt*tos))   # set new time offset (not Nitrous Oxide) 
     ntm=$((tse+nos))   # set new time including offset 
     ndt="$(date -d "@${ntm}" +"%F %T")" # new date/time value 
     nd1=${ndt% *}    # date portion (first field) of ntd 
     nd2=${ndt#* }    # time portion (second filed) of ntd 
     ncmb="$nd1 00:0${adj}:00 $nd2" # new combined "date 00:0n:00 time" string 
     # write new time to output 
     printf "%s\t%s\t%s\n" "$email" "$index" "$ncmb" >> "$ofn" 
    else 
     cnt=0; nos=0    # reset counter and new time offset 
     nd1=${idate% *}    # date portion (first field) of idate 
     nd2=${idate#* }    # time portion (second filed) of idate 
     ncmb="$nd1 00:00:00 $nd2" # new combined "date 00:00:00 time" string (no adj) 
     # write output unchanged 
     printf "%s\t%s\t%s\n" "$email" "$index" "$ncmb" >> "$ofn" 
    fi 

    pdate="$idate"     # save current date as prior date 

done <"$tfn" 

[ -r "$tfn" ] && rm "$tfn"   # remove temp file

輸出文件：（與相同輸入）

$ bash env4-2.sh 
[email protected] 1414743351  2014-10-30 00:00:00 23:00:00 
[email protected] 1414743351  2014-10-30 00:04:00 23:04:00 
[email protected] 1414743351  2014-10-31 00:00:00 09:15:51 
[email protected] 1414743351  2014-10-31 00:04:00 09:19:51 
[email protected] 1414743351  2014-10-31 00:08:00 09:23:51 
[email protected] 1414743351  2014-10-31 00:00:00 10:25:00 
[email protected] 1414743351  2014-11-01 00:00:00 09:00:00 
[email protected] 1414743351  2014-11-01 00:00:00 10:25:00

來源

2014-10-31 05:13:08

謝謝你的腳本:-)。對我幫助很大。我已經有一段時間試圖使腳本適應我的需求，而且我遇到了一些問題：printf「％s \ t％s \ n」「$ index」「$（date -d」@ $ { ntm}「+」％F％T「）」>>「$ ofn」我使用的字段多於兩個。這是輸出：[email protected] 186808 2014-11-02 00:04:00 12:06:00。舊的日期是2014-11-02 12:06:00，我得到00:04:00和12：006：00而不是2014-11-02 12:10:00。如有必要，我可以發佈我的代碼。 – 2014-11-01 22:51:52

無需發佈。所有的改變都將由讀取控制，然後在讀取後進行串接。之前我寫的addemdum，確認基本上我們只需要不考慮'00：04：00'，並考慮日期字符串是'2014年11月2日12：06：00'檢查重複和調整時間的目的你的總字符串'[email protected] 186808 2014-11-02 00:04:00 12：06：00' – 2014-11-02 00:17:35

是的。我最初的字符串是：PG 1358 [email protected] 186808 2014年11月2日12點06分00秒0 2，它應該看看到底：PG 1358 [email protected] 186808 2014年11月2日12:10： 00 0 2我使用它寫回是行：printf的「％S \ t％S \ t％S \ t％S \ t％S \ t％S \ t％S \ n」「$ SISTEMA」「$ MENSAXE_ID」「$ EMAIL」「$ NUMERO_DESTINATARIOS」「$（/ bin/date -d」@ $ {NTM}「+」％F％T「）」「$ ESTADO」「$ GRUPO」>> pendientes.sorted但我得到：pg 1358 [email protected] 186808 2014-11-02 00:04:00 12:06:00 0 2你的腳本的作品真的很好。 Thx – 2014-11-02 00:37:12

修改文本文件中不是唯一行的字段

回答

相關問題