2015-06-28 51 views
0

我有一個包含許多日誌文件:格式和過濾器文件到CSV表

PS:這個問題是從先前的問題here啓發。但略有改善。

at 10:00 carl 1 STR0 STR1 STR2 STR3 <STR4 STR5> [STR6 STR7] STR8: 
academy/course1:oftheory:SMTGHO:nothing: 
academy/course1:ofapplicaton:SMTGHP:onehour: 

at 10:00 carl 2 STR0 STR1 STR2 STR3 <STR4 STR78> [STR6 STR111] STR8: 
academy/course2:oftheory:SMTGHM:math: 
academy/course2:ofapplicaton:SMTGHN:twohour: 

at 10:00 david 1 STR0 STR1 STR2 STR3 <STR4 STR758> [STR6 STR155] STR8: 
academy/course3:oftheory:SMTGHK:geo: 
academy/course3:ofapplicaton:SMTGHL:halfhour: 

at 10:00 david 2 STR0 STR1 STR2 STR3 <STR4 STR87> [STR6 STR74] STR8: 
academy/course4:oftheory:SMTGH:SMTGHI:history: 
academy/course4:ofapplicaton:SMTGHJ:nothing: 

at 14:00 carl 1 STR0 STR1 STR2 STR3 <STR4 STR11> [STR6 STR784] STR8: 
academy/course5:oftheory:SMTGHG:nothing: 
academy/course5:ofapplicaton:SMTGHH:twohours: 

at 14:00 carl 2 STR0 STR1 STR2 STR3 <STR4 STR86> [STR6 STR85] STR8: 
academy/course6:oftheory:SMTGHE:music: 
academy/course6:ofapplicaton:SMTGHF:twohours: 

at 14:00 david 1 STR0 STR1 STR2 STR3 <STR4 STR96> [STR6 STR01] STR8: 
academy/course7:oftheory:SMTGHC:programmation: 
academy/course7:ofapplicaton:SMTGHD:onehours: 

at 14:00 david 2 STR0 STR1 STR2 STR3 <STR4 STR335> [STR6 STR66] STR8: 
academy/course8:oftheory:SMTGHA:philosophy: 
academy/course8:ofapplicaton:SMTGHB:nothing: 

我曾嘗試以下應用的代碼,但白白:

BEGIN { 
    # set records separated by empty lines 
    RS="" 
    # set fields separated by newline, each record has 3 fields 
    FS="\n" 
} 
{ 
    # remove undesired parts of every first line of a record 
    sub("at ", "", $1) 
    # now store the rest in time and course 
    time=$1 
    course=$1 
    # remove time from string to extract the course title 
    sub("^[^ ]* ", "", course) 
    # remove course title to retrieve time from string 
    sub(course, "", time) 
    # get theory info from second line per record 
    sub("course:theory:", "", $2) 
    # get application info from third line 
    sub("course:applicaton:", "", $3) 
    # if new course 
    if (! (course in header)) { 
     # save header information (first words of each line in output) 
     header[course] = course 
     theory[course] = "theory" 
     app[course] = "application" 
    } 
    # append the relevant info to the output strings 
    header[course] = header[course] "," time 
    theory[course] = theory[course] "," $2 
    app[course] = app[course] "," $3 

} 
END { 
    # now for each course found 
    for (key in header) { 
     # print the strings constructed 
     print header[key] 
     print theory[key] 
     print app[key] 
     print "" 
} 

反正有獲得這些字符串STR *和SMTGH *的車程,爲了得到這輸出:

carl 1,10:00,14:00 
applicaton,halfhour,onehours 
theory,geo,programmation 

carl 2,10:00,14:00 
applicaton,nothing,nothing 
theory,history,philosophy 

david 1,10:00,14:00 
applicaton,onehour,twohours 
theory,nothing,nothing 

david 2,10:00,14:00 
applicaton,twohour,twohours 
theory,math,music 

回答

2

GNU的awk

awk -F: -v OFS=, ' 
    /^at/ { 
    split($0, f, " ") 
    time = f[2] 
    course = f[3] " " f[4] 
    times[course] = times[course] OFS time 
    } 
    $2 == "oftheory"  {th[course] = th[course] OFS $(NF-1)} 
    $2 == "ofapplicaton" {ap[course] = ap[course] OFS $(NF-1)} 
    END { 
    PROCINFO["sorted_in"] = "@ind_str_asc" 
    for (c in times) { 
     printf "%s%s\n", c, times[c] 
     printf "application%s\n", ap[c] 
     printf "theory%s\n", th[c] 
     print "" 
    } 
    } 
' file 
carl 1,10:00,14:00 
application,onehour,twohours 
theory,nothing,nothing 

carl 2,10:00,14:00 
application,twohour,twohours 
theory,math,music 

david 1,10:00,14:00 
application,halfhour,onehours 
theory,geo,programmation 

david 2,10:00,14:00 
application,nothing,nothing 
theory,history,philosophy