2013-12-17 25 views
2

我想分開並計算我的輸入列表中的元素數量。 input.txt包含2列,$ 1是元素ID,$ 2是比率(數字)。用條件對列表中的元素進行分隔和計數

ENSG001 12.3107448237 
ENSG007 4.3602275 
ENSG008 2.9918420285 
ENSG009 1.035588 
ENSG010 0.999864 
ENSG012 0.569833 
ENSG013 0.495325 
ENSG014 0.253893 
ENSG015 0.125389 
ENSG017 0.012568 
ENSG018 -0.135689 
ENSG020 -0.4938497942 
ENSG022 -0.6429221854 
ENSG024 -1.1759339381 
ENSG029 -4.2722999766 
ENSG030 -11.8447513281 

我想的比例分成以下幾類:

Greater than or equal to 2 
Between 1 and 2 
Between 0.5 and 1 
Between -0.5 and 0.5 
Between -1 and -0.5 
Between -2 and -1 
Less than or equal to 2 

,然後打印從每個類別計數到一個單一的單獨的輸出文件RESULTS.TXT:

Total 16 
> 2 3 
1 to 2 1 
0.5 to 1 2 
-0.5 to 0.5 6 
-0.5 to -1 1 
-1 to -2  1 
< -2 2 

我可以在命令行上使用以下命令執行此操作:

awk $2 > 2 {print $1,$2} input.txt | wc -l 
awk $2 > 0.5 && $2 < 1 {print $1,$2} input.txt | wc -l 
awk $2 > -0.5 && $2 < 0.5 {print $1,$2} input.txt | wc -l 
awk $2 > -0.5 && $2 < -1 {print $1,$2} input.txt | wc -l 
awk $2 > -1 && $2 < -0.5 {print $1,$2} input.txt | wc -l 
awk $2 > -2 && $2 < -1 {print $1,$2} input.txt | wc -l 
awk $2 < -2 {print $1,$2} input.txt | wc -l 

我認爲有一個更快的方式使用shell腳本while或for循環,但我不知道如何去做。任何建議都會很棒。

回答

3

你可以處理文件一次,簡單的方法是:

awk '$2>=2{a++;next} 
$2>0.5 && $2 <1 {b++;next} 
$2>-0.5 && $2 <0.5 {c++;next} 
... 
$2<=-2{x++;next} 
END{print "total:",NR; 
    print ">2:",a; 
    print "1-2:",b; 
    ... 
    print "<-2:",x 
}' file 
+0

啊,更簡潔比我^^ +1 –

1

一種方法是通過對每個類別中,您有興趣在一個運行中的計數與一個awk命令來實現這一點。

#!/bin/bash 

if [ $# -ne 1 ] 
then 
    echo "Usage: $0 INPUT" 
    exit 1 
fi 

awk ' { 
    if  ($2 > 2) count[0]++ 
    else if ($2 > 1) count[1]++ 
    else if ($2 > 0.5) count[2]++ 
    else if ($2 > -0.5) count[3]++ 
    else if ($2 > -1) count[4]++ 
    else if ($2 > -2) count[5]++ 
    else count[6]++ 
} END { 
    print "  > 2\t", count[0] 
    print " 1 to 2\t", count[1] 
    print " 0.5 to 1\t", count[2] 
    print "-0.5 to 0.5\t", count[3] 
    print "-1 to -0.5\t", count[4] 
    print "-2 to -1\t", count[5] 
    print "  < -2\t", count[6] 
}' $1 
2

你可以簡單地輸入數字,排序使用sort,後來算在每個區間的條目數。例如,考慮您的輸入:

cut -f 2 -d ' ' input.txt | sort -nr | awk ' 
    BEGIN { split("2 1 0.5 -0.5 -1 -2", inter); i = 1; } 
    { 
     if (i > 6) { ++c; next; } 
     if ($1 >= inter[i]) ++c; 
     else if (i == 1) { print c, "greater than", inter[i++]; c = 1; } 
     else { print c, "between", inter[i - 1], "and", inter[i++]; c = 1; } 
    } 
    END { print c, "lower than", inter[i - 1]; }' 

如果輸入已經排序,你甚至會縮短你的命令行,使用:

awk 'BEGIN { split("2 1 0.5 -0.5 -1 -2", inter); i = 1; } 
{ 
    if (i > 6) { ++c; next; } 
    if ($2 >= inter[i]) ++c; 
    else if (i == 1) { print c, "greater than", inter[i++]; c = 1; } 
    else { print c, "between", inter[i - 1], "and", inter[i++]; c = 1; } 
} 
END { print c, "lower than", inter[i - 1]; }' input.txt 

而產生的輸出 - 這可能會爲您格式化將:

3 greater than 2 
1 between 2 and 1 
2 between 1 and 0.5 
6 between 0.5 and -0.5 
1 between -0.5 and -1 
1 between -1 and -2 
2 lower than -2 
+0

+1聰明(和易於修改,靈活性) –

+0

這個應該被選擇......它真的回答了OP希望有一個while/for循環,即一種將程序/方法與實際值分開的方法。 –

1
awk -f script.awk input.txt 

script.awk

{ 
    if ($2>=2) counter1++ 
    else if ($2>=1) counter2++ 
    else if ($2>=0.5) counter3++ 
    else if ($2>=-0.5) counter4++ 
    else if ($2>=-1) counter5++ 
    else if ($2>=-2) counter6++ 
    else counter7++ 
} 
END{ 
    print "Greater than 2: "counter1 
    print "Between 1 and 2: "counter2 
    print "Between 0.5 and 1: "counter3 
    print "Between -0.5 and 0.5: "counter4 
    print "Between -1 and -0.5: "counter5 
    print "Between -2 and -1: "counter6 
    print "Less than 2: "counter7 
} 
1

腳本TOTO:

awk ' 
     $2>2     { count[1]++; label[1]="Greater than or equal to 2"; } 
    ($2>1 && $2<=2)  { count[2]++; label[2]="Between 1 and 2"; } 
    ($2>0.5 && $2<=1)  { count[3]++; label[3]="Between 0.5 and 1"; } 
    ($2>-0.5 && $2<=0.5) { count[4]++; label[4]="Between -0.5 and 0.5"; } 
    ($2>-1 && $2<=-0.5) { count[5]++; label[5]="Between -1 and -0.5"; } 
    ($2>-2 && $2<=-1) { count[6]++; label[6]="Between -2 and -1"; } 
       $2<=-2  { count[7]++; label[7]="Less than or equal to 2"; } 

    END { for (i=1;i<=7;i++) 
      { printf "%-30s %s\n" ,label[i], count[i]; 
      } 
     } 
    ' /tmp/input.txt 

和結果:

. /tmp/toto 

Greater than or equal to 2  3 
Between 1 and 2    1 
Between 0.5 and 1    2 
Between -0.5 and 0.5   6 
Between -1 and -0.5   1 
Between -2 and -1    1 
Less than or equal to 2  2 
+0

哈哈,沒有訂購...我選擇了一條奇怪的路徑^^。我會修復它 –

+0

固定。爲了清晰起見,我謹慎行事^^ –

+0

哈哈!感謝您的評論!這是OP選擇最適合自己的目的(:你的選擇非常整潔!^^ – Rubens

相關問題