2016-11-23 32 views
1

我有一個包含在第一列任務名稱以及完成任務的第二列如下時間文件:unix文件中任務的最小值和最大值?

Task2, 3421 
Task3, 3300 
Task1, 1000 
Task2, 1100 
Task3, 1200 
Task3, 1209 
Task4, 1299 
Task3, 1289 
Task1, 1389 
Task2, 1211 
Task5, 1216 
Task2, 1416 
Task1, 2100 
Task6, 2416 
Task5, 2216 
Task7, 1116 

現在,我必須找到採取每個任務的最小和最大時間並以下面的格式輸出

task , maxtime , min time 

eg

Task1, 1000, 2100 (from the data given above) 
+0

見[這裏](http://stackoverflow.com/a/40780716/6769931)用於準 「自由AWK-」 的答案。;) –

回答

4

您可以awk

awk ' 
    BEGIN{FS=","; OFS=", "} 
    !($1 in max) || $2>max[$1]{max[$1]=$2} 
    !($1 in min) || $2<min[$1]{min[$1]=$2} 
    END{ 
     for(k in max){print k, min[k], max[k]} 
    }' input.txt 

你試試,

Task1, 1000, 2100 
Task2, 1100, 3421 
Task3, 1200, 3300 
Task4, 1299, 1299 
Task5, 1216, 2216 
Task6, 2416, 2416 
Task7, 1116, 1116 
+0

我把這些行放在一個腳本中,並執行語法錯誤awk:第3行附近的語法錯誤012kawk:在第3行附近出錯 – Vicky

+0

再次嘗試,修復後 –

+0

同樣的錯誤awk:第3行附近的語法錯誤 awk :在線3附近救助 – Vicky

1

另一種方式來做到這一點是通過列1,然後通過列2排序,並採取了第一個和最後一個值像這樣的每個任務

awk -F, '{arr[$1]=arr[$1] $2} END {for(key in arr) print key, arr[key]}' <(sort -t 1 -k 1,2 file) | awk '{OFS=", "; print $1, $2, $NF}' 

樣品運行:

$ cat file 
Task2, 3421 
Task3, 3300 
Task1, 1000 
Task2, 1100 
Task3, 1200 
Task3, 1209 
Task4, 1299 
Task3, 1289 
Task1, 1389 
Task2, 1211 
Task5, 1216 
Task2, 1416 
Task1, 2100 
Task6, 2416 
Task5, 2216 
Task7, 1116 
$ sort -t 1 -k 1,2 file 
Task1, 1000 
Task1, 1389 
Task1, 2100 
Task2, 1100 
Task2, 1211 
Task2, 1416 
Task2, 3421 
Task3, 1200 
Task3, 1209 
Task3, 1289 
Task3, 3300 
Task4, 1299 
Task5, 1216 
Task5, 2216 
Task6, 2416 
Task7, 1116 
$ awk -F, '{arr[$1]=arr[$1] $2} END {for(key in arr) print key, arr[key]}' <(sort -t 1 -k 1,2 file) | awk '{OFS=", "; print $1, $2, $NF}' 
Task1, 1000, 2100 
Task2, 1100, 3421 
Task3, 1200, 3300 
Task4, 1299, 1299 
Task5, 1216, 2216 
Task6, 2416, 2416 
Task7, 1116, 1116 
+0

爲什麼task4有相同的最小值和最大值? – Vicky

+0

@ user3369871因爲任務4只有1個條目 – ritesht93

+1

對於任務7也只有一個條目,但輸出中的task7缺少最大時間 – Vicky

1

使用gawkarray of arrays

gawk 'BEGIN{OFS=FS=","} 
     $2>a[$1]["max"]{a[$1]["max"]=$2} 
     $2<a[$1]["min"] || !a[$1]["min"] {a[$1]["min"]=$2} 
     END {for (i in a){ 
      print i, a[i]["min"],a[i]["max"] 
      } 
     }' file 

here

1

這裏是另一替代

$ join -t, <(sort file){,} | sort -k1,1 -k2n -k3nr | rev | uniq -2 | rev 
0

sort它上的第一和第二列,然後在awk它。這個解決方案(awk部分)的好處在於它不會將數據存儲在內存中並最終將其轉儲出去,而是一旦找到新數據就會輸出以前的$1的數據。在這裏:

$ sort -t, -k1 foo -k2n | \      # sort 
awk '!($1 in min) {min[$1]=$2}    # first of each is always min (and max) 
     ($1 in min) {max[$1]=$2}    # every current one is always max 
     $1!=p && NR>1 {print p, min[p], max[p]} # if $1 differs from previous, print previous 
        {p=$1}      # p is current for next round 
     END   {print p, min[p], max[p]}' # dump buffer 
Task1, 1000 2100 
Task2, 1100 3421 
Task3, 1200 3300 
Task4, 1299 1299 
Task5, 1216 2216 
Task6, 2416 2416 
Task7, 1116 1116 
1

使用sortsedawk

sort -k1,1 -k2n input.txt | sed -r ':a;N;$!ba;:b;s/(Task[0-9]+,)([0-9 ,]+)\n?\1([0-9]+)/\1\2, \3/g;tb;' | awk 'BEGIN{FS=OFS=", ";}{print $1, $2, $NF}' 

使用sortsed替代解決方案的另一個答案只有

sort -k1,1 -k2n input.txt | sed -r ':a;N;$!ba;:b;s/(Task[0-9]+,)([0-9 ,]+)\n?\1([0-9]+)/\1\2, \3/g;tb;' | sed -r -e 's/^([^ ]+)\s([^ ]+)\s.*\s([^ ]+)/\1 \2 \3/' -e 's/^([^ ]+)\s([^ ]+)$/\1 \2, \2/' 

你,

Task1, 1000, 2100 
Task2, 1100, 3421 
Task3, 1200, 3300 
Task4, 1299, 1299 
Task5, 1216, 2216 
Task6, 2416, 2416 
Task7, 1116, 1116 
0

這主要是bash,如果你有這方面的問題,我可以用別的東西替代awk命令......(例如,如果時間始終在同一列中,則爲colrm)。

# Keep a list of already processed task names 
already_processed="" 

# Use read to read only the first column from the data file 
while IFS=',' read -ra task; do 
    # If the task has already been processed, skip it and go to the next line 
    if echo "$already_processed" | grep $task > /dev/null; then 
    continue 
    else 
    # Select all the task with the same name from the data file, take the 
    #+second column and sort it to find the max and the minimum. 
    MIN=`grep $task $1 | awk -F',' '{print $2}' | sort -n | head -1` 
    MAX=`grep $task $1 | awk -F',' '{print $2}' | sort -n | tail -1` 
    # Add the task to the "already_processed" tasks (to be sure each task will 
    #+appear only once in the output 
    already_processed="$already_processed:$task" 
    # Print the output in the wanted format. 
    echo "${task}, ${MIN}, ${MAX}" 
    fi 

done < $1 

只要確保您的數據文件以空行結束。

實施例:

bash <name_of_script_file> <name_of_data_file> | sort  
Task1, 1000, 2100 
Task2, 1100, 3421 
Task3, 1200, 3300 
Task4, 1299, 1299 
Task5, 1216, 2216 
Task6, 2416, 2416 
Task7, 1116, 1116