awk顯示2個文件中的差異

我試圖使用awk顯示兩個文件之間的計數和差異（如果有的話）。下面的awk將在file2中顯示$3的唯一計數，但是如何顯示未找到的ID？謝謝：）。awk顯示2個文件中的差異

file1的

ACTA2 
ACTC1 
APC 
APOB 
BRCA1 
BRCA2

file2的（ACTA2, ACTC1, APC are all unique so they are used in the count）

chr10:90694965-90695138 ACTA2-1269|gc=52.6 639.7 
chr10:90697803-90698014 ACTA2-1270|gc=50.2 347.6 
chr15:35082598-35082771 ACTC1-254|gc=50.3 603.8 
chr15:35085431-35085785 ACTC1-258|gc=54.8 633.8 
chr15:35086866-35087046 ACTC1-259|gc=67.2 291.0 
chr5:112043405-112043589 APC-1396|gc=70.1 334.8 
chr5:112090578-112090732 APC-1397|gc=39.6 171.6 
chr5:112102006-112102125 APC-1398|gc=33.6 52.3 
chr5:112102876-112103097 APC-1399|gc=41.2 177.4

AWK

awk -F '[- ]' '!seen[$3]++ {n++} END {print n " ids found)}' file2

期望的結果（comes from file2 - 已經作品）

3 ids found和APOB，BRCA1，BRCA2缺失

來源

2016-03-02 Chris

這讓你很接近所需輸出：

$ awk -F'[ -]' 'NR == FNR { seen[$0]; next } !seen[$3]++ { n++ } 
END { print n " ids found"; for (i in seen) if (!seen[i]) print i " missing" }' file1 file2 
3 ids found 
APOB missing 
BRCA1 missing 
BRCA2 missing

它主要通過seen陣列循環並檢查值。如果在第二個文件中沒有看到!seen[i]爲真。

來源

2016-03-02 21:13:58

這裏是一個原型

$ awk -F '[- ]' 'NR==FNR{a[$0];next} 
       ($3 in a){delete a[$3]} 
        END {for(k in a) printf "%s ",k; print "missing"}' file{1,2} 

BRCA1 BRCA2 APOB missing

與右輸出格式

$ awk -F '[- ]' 'NR==FNR{a[$0];next} 
       ($3 in a){delete a[$3]; c++} 
        END{printf "%s ids found and ", c; 
         for(k in a) {printf "%s",sep k; sep=","} 
         print " missing"}' file{1,2} 

3 ids found and BRCA1,BRCA2,APOB missing

來源

2016-03-02 21:14:21 karakfa

謝謝兩位非常:)。 – Chris

awk顯示2個文件中的差異

回答

相關問題