2016-09-13 37 views
0

給定一個.txt文件(DNA序列比對報告),格式如下:拼搶和分離特定的線路與一個或多個實例

5463784 reads; of these: 
    5463784 (100.00%) were paired; of these: 
    841569 (15.40%) aligned concordantly 0 times 
    4469608 (81.80%) aligned concordantly exactly 1 time 
    152607 (2.79%) aligned concordantly >1 times 
    ---- 
    841569 pairs aligned 0 times concordantly or discordantly; of these: 
     1683138 mates make up the pairs; of these: 
     1407028 (83.60%) aligned 0 times 
     226521 (13.46%) aligned exactly 1 time 
     49589 (2.95%) aligned >1 times 
87.12% overall alignment rate 

什麼是搶的具體線的子部分最簡單,最簡單的辦法?例如,如果我想搶「正好」行,我可以使用:

awk '/exactly/{print}' 

這將返回:

4469608 (81.80%) aligned concordantly exactly 1 time 
226521 (13.46%) aligned exactly 1 time 

,但我不知道如何再拆什麼回到獲得4469608226521在一個數組中(然後最終總結在一起)給變量設置爲4696129

+1

'awk'/ exactly/{print $ 1}''會打印第一個字段。然後,你可以將它們求和,例如sike:'awk'/ exactly/{sum + = $ 1} END {print sum}' –

+0

有趣。所以不需要指定分隔符? – AnnaSchumann

+1

默認分隔符已被設置爲處理空間。你可以在命令行上用'-F =「;」'或用代碼中的'FS =「;」''來改變它。 –

回答

1
awk '/exactly/ {sum=sum+$1;}END{print sum}' dna 

就以那些行,其中究竟是本動作中,第一塔的值存儲在一個變量awk稱爲sum和最終印刷。