2016-03-02 28 views
0

我試圖使用awk顯示兩個文件之間的計數和差異(如果有的話)。下面的awk將在file2中顯示$3的唯一計數,但是如何顯示未找到的ID?謝謝 :)。awk顯示2個文件中的差異

file1的

ACTA2 
ACTC1 
APC 
APOB 
BRCA1 
BRCA2 

file2的ACTA2, ACTC1, APC are all unique so they are used in the count

chr10:90694965-90695138 ACTA2-1269|gc=52.6 639.7 
chr10:90697803-90698014 ACTA2-1270|gc=50.2 347.6 
chr15:35082598-35082771 ACTC1-254|gc=50.3 603.8 
chr15:35085431-35085785 ACTC1-258|gc=54.8 633.8 
chr15:35086866-35087046 ACTC1-259|gc=67.2 291.0 
chr5:112043405-112043589 APC-1396|gc=70.1 334.8 
chr5:112090578-112090732 APC-1397|gc=39.6 171.6 
chr5:112102006-112102125 APC-1398|gc=33.6 52.3 
chr5:112102876-112103097 APC-1399|gc=41.2 177.4 

AWK

awk -F '[- ]' '!seen[$3]++ {n++} END {print n " ids found)}' file2  

期望的結果comes from file2 - 已經作品)

3 ids found和APOB,BRCA1,BRCA2缺失

回答

1

這讓你很接近所需輸出:

$ awk -F'[ -]' 'NR == FNR { seen[$0]; next } !seen[$3]++ { n++ } 
END { print n " ids found"; for (i in seen) if (!seen[i]) print i " missing" }' file1 file2 
3 ids found 
APOB missing 
BRCA1 missing 
BRCA2 missing 

它主要通過seen陣列循環並檢查值。如果在第二個文件中沒有看到!seen[i]爲真。

1

這裏是一個原型

$ awk -F '[- ]' 'NR==FNR{a[$0];next} 
       ($3 in a){delete a[$3]} 
        END {for(k in a) printf "%s ",k; print "missing"}' file{1,2} 

BRCA1 BRCA2 APOB missing 

與右輸出格式

$ awk -F '[- ]' 'NR==FNR{a[$0];next} 
       ($3 in a){delete a[$3]; c++} 
        END{printf "%s ids found and ", c; 
         for(k in a) {printf "%s",sep k; sep=","} 
         print " missing"}' file{1,2} 

3 ids found and BRCA1,BRCA2,APOB missing 
+0

謝謝兩位非常:)。 – Chris

相關問題