2016-09-28 122 views
0

下面是我的輸入數據,我試圖創建到數據透視表。在UNIX中創建數據透視表

input.txt中

ID,CreateDate,Category,Region,PublishDate,Code,Listing,Type,ModifiedDate 
FRU426131598,22-Aug-16,SELLING,COUNTRY,22-Aug-16,1,SAMPLE,GRAPE,22-Aug-16 
FRU426175576,23-Aug-16,SELLING,COUNTRY,23-Aug-16,1,SAMPLE,APPLE,23-Aug-16 
FRU427163049,26-Aug-16,SELLING,COUNTRY,26-Aug-16,1,SAMPLE,APPLE,26-Aug-16 
FRU427163049,26-Aug-16,SELLING,COUNTRY,26-Aug-16,1,SAMPLE,APPLE,26-Aug-16 
FRU427163049,26-Aug-16,SELLING,COUNTRY,26-Aug-16,1,SAMPLE,GRAPE,26-Aug-16 
FRU427163049,26-Aug-16,SELLING,COUNTRY,26-Aug-16,1,SAMPLE,GRAPE,26-Aug-16 
FRU427163049,26-Aug-16,SELLING,COUNTRY,26-Aug-16,1,SAMPLE,APPLE,26-Aug-16 
FRU427163049,26-Aug-16,SELLING,COUNTRY,26-Aug-16,1,SAMPLE,APPLE,26-Aug-16 
FRU426972836,26-Aug-16,SELLING,COUNTRY,26-Aug-16,1,SAMPLE,ORANGE,26-Aug-16 
FRU427322180,28-Aug-16,SELLING,COUNTRY,28-Aug-16,1,SAMPLE,GRAPE,28-Aug-16 
FRU427032658,26-Aug-16,SELLING,COUNTRY,26-Aug-16,1,SAMPLE,APPLE,26-Aug-16 
FRU427373494,29-Aug-16,SELLING,COUNTRY,29-Aug-16,1,SAMPLE,GRAPE,29-Aug-16 
FRU427373069,29-Aug-16,SELLING,COUNTRY,29-Aug-16,1,SAMPLE,GRAPE,29-Aug-16 
FRU425669484,19-Aug-16,SELLING,COUNTRY,19-Aug-16,1,SAMPLE,APPLE,19-Aug-16 
FRU425616815,18-Aug-16,SELLING,COUNTRY,18-Aug-16,1,SAMPLE,APPLE,18-Aug-16 
FRU420018273,25-Sep-16,SELLING,COUNTRY,25-Sep-16,1,SAMPLE,ORANGE,25-Sep-16 
FRU435018589,25-Sep-16,SELLING,COUNTRY,25-Sep-16,1,SAMPLE,ORANGE,25-Sep-16 
FRU421375128,26-Sep-16,SELLING,COUNTRY,26-Sep-16,1,SAMPLE,APPLE,26-Sep-16 
FRU434911933,21-Sep-16,SELLING,COUNTRY,21-Sep-16,1,SAMPLE,ORANGE,21-Sep-16 
FRU434594125,21-Sep-16,SELLING,COUNTRY,21-Sep-16,1,SAMPLE,ORANGE,21-Sep-16 

類型日提交的作爲行,CREATEDATE日提交爲列。和ID字段的值的總和。

所需的輸出:

Row Labels 18-Aug-16 19-Aug-16 22-Aug-16 23-Aug-16 26-Aug-16 28-Aug-16 29-Aug-16 21-Sep-16 25-Sep-16 26-Sep-16 Grand Total 
APPLE 1 1  1 5     1 9 
GRAPE   1  2 1 2    6 
ORANGE     1   2 2  5 
Grand Total 1 1 1 1 8 1 2 2 2 1 20 

有沒有辦法做到這一點?我可以使用awk獲取createdDate的數量。但無法使用行和列創建數據透視表。

回答

1

awk來救援!

這可以讓你開始...

$ awk -F, -v OFS='\t' 'NR>1 {k=$(NF-1); d=$2; keys[k]; dates[d]; a[k,d]++} 
         END {line="Row Labels"; 
          for(d in dates) line = line OFS d; 
          print line; 
          for(k in keys) 
           {{line=k; 
           for(d in dates) line=line OFS a[k,d]} 
           print line}}' file  

Row Labels  19-Aug-16  29-Aug-16  23-Aug-16  18-Aug-16  28-Aug-16  22-Aug-16  26-Aug-16  26-Sep-16 21-Sep-16  25-Sep-16 
APPLE 1    1  1      5  1 
ORANGE             1    2  2 
GRAPE   2      1  1  2 

你可能想的日期(即不容易)進行排序,並可以添加總計(容易)。

0

這是一個排序日期的方法。需要GNU awk的

awk -F, ' 
    function date2epoch(date, arr,mon) { 
     split(date, arr, /-/) 
     mon = (index("JanFebMarAprMayJunJulAugSepOctNovDec", arr[2]) - 1)/3 + 1 
     return mktime("20" arr[3] " " mon " " arr[1] " 0 0 0") 
    } 
    NR > 1 { 
     d = date2epoch($NF) 
     dates[d] 
     count[$(NF-1)][d]++ 
     total[d]++ 
    } 
    END { 
     PROCINFO["sorted_in"] = "@ind_str_asc" 

     printf "Row Label" 
     for (d in dates) 
      printf "\t%s", strftime("%d-%b-%y", d) 
     print "" 

     for (type in count) { 
      printf "%s", type 
      for (d in dates) 
       printf "\t%s", count[type][d] 
      print "" 
     } 

     printf "Total" 
     for (d in dates) 
      printf "\t%s", total[d] 
     print "" 
    } 
' file 
0

隨着GNU AWK 4 *真正的多維數組和sorted_in:

$ cat tst.awk 
BEGIN { FS=","; OFS="\t" } 
NR>1 { 
    split($2,t,/-/) 
    date = sprintf("%02d%02d%02d",t[3],(match("JanFebMarAprMayJunJulAugSepOctNovDec",t[2])+2)/3,t[1]) 
    dateNames[date] = $2 
    fruitCnts[$8][date]++ 
} 
END { 
    PROCINFO["sorted_in"] = "@ind_str_asc" 

    printf "%s%s", "Row Labels", OFS 
    for (date in dateNames) { 
     printf "%s%s", dateNames[date], OFS 
    } 
    print "Grand Total" 

    for (fruit in fruitCnts) { 
     fruitTotal = 0 
     printf "%s%s", fruit, OFS 
     for (date in dateNames) { 
      cnt = (date in fruitCnts[fruit] ? fruitCnts[fruit][date] : "") 
      printf "%s%s", cnt, OFS 
      dateTotals[date] += cnt 
      fruitTotal += cnt 
     } 
     print fruitTotal 
    } 

    printf "%s%s", "Grand Total", OFS 
    for (date in dateNames) { 
     printf "%s%s", dateTotals[date], OFS 
     total += dateTotals[date] 
    } 
    print total 
} 

$ awk -f tst.awk file 
Row Labels  18-Aug-16  19-Aug-16  22-Aug-16  23-Aug-16  26-Aug-16  28-Aug-16  29-Aug-16  21-Sep-16 25-Sep-16  26-Sep-16  Grand Total 
APPLE 1  1    1  5          1  9 
GRAPE     1    2  1  2        6 
ORANGE         1      2  2    5 
Grand Total  1  1  1  1  8  1  2  2  2  1  20 

$ awk -f tst.awk file | column -s$'\t' -t 
Row Labels 18-Aug-16 19-Aug-16 22-Aug-16 23-Aug-16 26-Aug-16 28-Aug-16 29-Aug-16 21-Sep-16 25-Sep-16 26-Sep-16 Grand Total 
APPLE  1   1      1   5              1   9 
GRAPE        1      2   1   2           6 
ORANGE             1        2   2      5 
Grand Total 1   1   1   1   8   1   2   2   2   1   20 
$