2011-07-25 44 views
0

我用awk處理數據文件如下:awk中選擇數據行

YEARS:1995:1996:1997:1998:1999:2000 
VISITS 
Domain1:259:2549:23695:24889:1240:21202 
Domain2:32632:87521:147122:22952:2365:121230 
Domain3:5985:92104:921744:43124:74234:68350 
Domain4:8321:36520:68712:32102:22003:82100 
SIGNUPS 
Domain1:212:202:992:1202:986:3253 
Domain2:10401:44522:20103:3595:11410:353 
Domain3:3695:23230:452030:25052:9858:3020 
Domain4:969:24247:9863:24101:5541:3663 

我需要知道每一年和域總訪問量和註冊。我的問題是我無法找到一種方法來選擇只有前四行和最後四行,有沒有人可以給我一些關於如何實現這一點的提示?

輸出示例(只訪問):

VISITS 
Domain1  73834 
Domain2  413822 
Domain3  1205541 
Domain4  309758 

     1995 1996 1997 1998 1999 2000 
All  47197 218694 1161273 123067 99842 292882 
+0

能否請您發佈基於所提供的輸入的預期輸出的例子嗎? –

回答

1

你可以匹配「訪問」和「註冊等」行,並設置一個變量,表示正在處理什麼樣的記錄。

一個例子:

BEGIN { 
    FS = ":"; 
} 
/^YEARS/ { 
    for (i = 2 ; i <= NF; i++) { 
     year[i] = $i; 
    } 
    next; 
} 
/^VISITS/ { 
    mode = "VISITS"; 
    next; 
} 
/^SIGNUPS/ { 
    mode = "SIGNUPS"; 
    next; 
} 
{ 
    for (i = 2; i <= NF; i++) { 
     # output "VISITS"/"SIGNUPS", domain, year, value 
     print mode, $1, year[i], $i; 
    } 
} 
1
awk -F: 'END { out() } 
/^YEARS/ { 
    for (i = 1; ++i <= NF;) { 
    y[i] = $i 
    yh = yh ? yh OFS $i : $i 
    } 
    ny = NF; next 
    } 
NF == 1 { 
    m && out(); m = $1 
    } 
{ 
    ym[y[1]] = "ALL:" 
    for (i = 1; ++i <= NF;) { 
    d[$1] += $i; ym[y[i]] += $i 
    } 
    } 
func out() { 
    print m 
    for (D in d) print D, d[D] 
    printf "\n%s\n", OFS yh 
    for (i = 0; ++i <= ny;) 
    printf "%s", (ym[y[i]] (i < ny ? OFS : RS)) 
    print x; split(x, d); split(x, ym) 
    }' OFS='\t' infile 

隨着GNU awk的,你可以使用:的

delete d; delete ym 

代替:

split(x, d); split(x, ym) 
1

當你說「只選前四和最後四行「,我假定你的意思是處理訪問和註冊等分別:

awk -F: ' 
$1 == "YEARS" {for (i=2; i<=NF; i++) {yr[i] = $i}; next} 
$1 == "VISITS" {visits = 1; signups = 0; next} 
$1 == "SIGNUPS" {visits = 0; signups = 1; next} 
visits { 
    for (i=2; i<=NF; i++) { 
    v_d[$1] += $i  # visits by domain 
    v_y[yr[i]] += $i # visits by year 
    } 
} 
signups { 
    for (i=2; i<=NF; i++) { 
    s_d[$1] += $i  # signups by domain 
    s_y[yr[i]] += $i # signups by year 
    } 
} 
END { 
    OFS=FS 
    print "VISITS" 
    for (d in v_d) print d, v_d[d] 
    for (y in v_y) print y, v_y[y] 
    print "SIGNUPS" 
    for (d in s_d) print d, s_d[d] 
    for (y in s_y) print y, s_y[y] 
}' 

鑑於你的輸入,該電源輸出

VISITS 
Domain1:73834 
Domain2:413822 
Domain3:1205541 
Domain4:249758 
1999:99842 
2000:292882 
1995:47197 
1996:218694 
1997:1161273 
1998:123067 
SIGNUPS 
Domain1:6847 
Domain2:90384 
Domain3:516885 
Domain4:68384 
1999:27795 
2000:10289 
1995:15277 
1996:92201 
1997:482988 
1998:53950