在awk中打印搜索模式

我想打印匹配的搜索模式，然後計算平均行。最好將是一個expample：在awk中打印搜索模式

輸入文件：

chr17 41275978 41276294 BRCA1_ex02_01 278 
chr17 41275978 41276294 BRCA1_ex02_01 279 
chr17 41275978 41276294 BRCA1_ex02_01 280 
chr17 41275978 41276294 BRCA1_ex02_02 281 
chr17 41275978 41276294 BRCA1_ex02_02 282 
chr17 41275978 41276294 BRCA1_ex02_03 283 
chr17 41275978 41276294 BRCA1_ex02_03 284 
chr17 41275978 41276294 BRCA1_ex02_03 285 
chr17 41275978 41276294 BRCA1_ex02_04 286 
chr17 41275978 41276294 BRCA1_ex02_04 287 
chr17 41275978 41276294 BRCA1_ex02_04 288

我在bash循環（例如）一樣的第四列瓦納提取物：

OUTPUT1：

chr17 41275978 41276294 BRCA1_ex02_01 278 
chr17 41275978 41276294 BRCA1_ex02_01 279 
chr17 41275978 41276294 BRCA1_ex02_01 280

OUTPUT2 ：

chr17 41275978 41276294 BRCA1_ex02_02 281 
chr17 41275978 41276294 BRCA1_ex02_02 282

OUTPUT3：

chr17 41275978 41276294 BRCA1_ex02_03 283 
chr17 41275978 41276294 BRCA1_ex02_03 284 
chr17 41275978 41276294 BRCA1_ex02_03 285

的等等。然後計算平均爲第五列是很容易的：

AWK 'END {總和+ = $ 5} {打印NR /總和}' in_file.txt

在我的情況下，有數千行BRCA1_exXX_XX - 所以任何想法熱分裂它？

Paul。

來源

2014-07-07 Geroge

假設項目分別由4列在給定的數據進行排序，你可以做這樣的：

awk ' 

    $4 != prev {    # if this line's 4th column is different from the previous line 
    if (cnt > 0)   # if count of lines is greater than 0 
     print prev, sum/cnt # print the average 
    prev = $4    # save previous 4th column 
    sum = $5    # initialize sum to column 5 
    cnt = 1     # initialize count to 1 
    next     # go to next line 
    } 

    { 
    sum += $5    # accumulate total of 5th column 
    ++cnt     # increment count of lines 
    } 

    END { 
    if (cnt > 0)    # if count > 0 (avoid divide by 0 on empty file) 
     print prev, sum/cnt # print the average for the last line 
    } 

' file

來源

2014-07-07 14:44:31 ooga

這假設條目總是按順序排列的。 –

Wau它看起來可以工作:-)謝謝！有可能解釋嗎？我可以添加到第三列標準偏差值嗎？ – Geroge

@EtanReisner是的，它假定條目按第4列排序，如給定數據中所示。 – ooga

我認爲這會做你想要什麼。

awk '{ 
    # Keep running sum of fifth column based on value of fourth column. 
    v[$4]+=$5; 
    # Keep count of lines with similar fourth column values. 
    n[$4]++ 
} 

END { 
    # Loop over all the values we saw and print out their fourth columns and the sum of the fifth columns. 
    for (val in n) { 
     print val ": " v[val]/n[val] 
    } 
}' $file

來源

2014-07-07 14:51:35

切勿將字母'l'用作變量名，因爲它看起來太像數字'1'。在某些字體中完全無法區分。 –

@EdMorton不夠公平。我用它來代表「線」，但在這方面也沒有什麼意義。編輯。 –

是的，這太棒了 - 它工作得很好。謝謝你的解釋！ – Geroge

在awk中打​​印搜索模式

回答

相關問題

在awk中打印搜索模式