2017-07-06 76 views
0

我試圖用awk添加$4$5$6領域和頭部的tab-delimetedfile2於在file2$2存在匹配的$3值線在file1。我爲每一行添加了評論,並對我的理解發生了什麼。謝謝 :)。awk來添加特定領域的基於文件的比賽現場

file1的tab-delimeted

ID Name Number 
0-0 A,A 123456 
2-2 B,B 789123 
4-4 C,C 456789 

file2的tab-delimeted

ID Number Name Info1 Info2 Info3 Info4 
0-0 123456 A,A aaaaa bbbbb ccccc eeeee 
1-1 111111 Z,Z aaa bbb ccc eee 
2-2 789123 B,B aaaaa bb,bbb ccccc eeeee 
3-3 222222 Y,Y aaa bb,bb cc e 
4-4 456789 C,C aaa bb ccc eeee 

期望的輸出tab-delimeted

ID Name Number Info1 Info2 Info3 
0-0 A,A 123456 aaaaa bbbbb ccccc 
2-2 B,B 789123 aaaaa bb,bbb ccccc 
4-4 C,C 456789 aaa bb ccc 

AWK

awk -F"\t" '$3 in a{ # read $3 value of file1 into array a 
a[$3]=a[$2]; # match $3 array a from file1 with $2 value in file2 
    next # process next line 
} # close block 
    { print $1,$2,a[$2],$4,$5,$6 # print desired output 
} # close block 
    END { # start block 
for (i in a) { # create for loop i to print 
    print a[i] # print for each matching line in i 
    } # close block 
}' file1 file2 
+1

獲取Arnold Robbins編寫的Effective Awk Programming第4版。你有很多類似的問題在這裏回答(而且檔案中還有數百個),所以你不應該問這個問題,除非你錯過了那本書的基本內容。 –

+0

我正在閱讀那本書以及其他一些書,並且正在學習,但這有點超出了我的專業領域。我會繼續閱讀和嘗試。感謝大家的幫助,解釋和耐心:)...它是一個陡峭的學習曲線,但是它非常有價值,是科學需要的。謝謝 :)。 – Chris

回答

2
$ awk -v OFS="\t" 'NR==FNR{a[$3]=$0;next}$2 in a{print a[$2],$4,$5,$6}' file1 file2 
ID  Name Number Info1 Info2 Info3 
0-0  A,A  123456 aaaaa bbbbb ccccc 
2-2  B,B  789123 aaaaa bb,bbb ccccc 
4-4  C,C  456789 aaa  bb  ccc 

解釋:

$ awk -v OFS="\t" '   # tab as OFS also 
NR==FNR{     # for file1 
    a[$3]=$0    # hash $0 to a using $3 as key 
    next     # no further processing for this record 
} 
$2 in a {     # if $2 found in a 
    print a[$2],$4,$5,$6 # output as requested 
}' file1 file2    # mind the file order 
1

嘗試:多一個方法首先讀取文件2,然後讀取文件1。

awk -F"\t" 'FNR==NR{a[$1,$3,$2]=$4 OFS $5 OFS $6;next} (($1,$2,$3) in a){print $1,$2,$3,a[$1,$2,$3]}' OFS="\t" file2 file1 

將在幾分鐘內添加解釋。

編輯:也添加非單線形式的解決方案。

awk -F"\t" 'FNR==NR{        ####Checking condition FNR==NR which will be only true when first file named file2 is being read. Because FNR and NR both represent the number of lines for a Input_file, only difference is FNR value will be RESET whenever it is starting to read next Input_file and NR value will be keep on increasing till all the Input_files are being read. 
       a[$1,$3,$2]=$4 OFS $5 OFS $6;  ####Creating an array named a whose index is $1,$3 and $2 and value is $4,$5 and $6. Where OFS is output field separator, whose default value is space. 
       next        ####next is awk built-in keyword which will NOT allow cursor to go further and will skip all next statements. 
      } 
    (($1,$2,$3) in a){       ####Checking a condition which will be only checked when 2nd Input_file is being read. So checking here if $1, $2 and $3 is present in array a, then do following. 
         print $1,$2,$3,a[$1,$2,$3]####print the value of $1, $2,$3 and array a value whose index is $1,$2 and $3. 
         } 
    ' OFS="\t" file2 file1      ####Mentioning the Input_files here. 
+0

非常感謝你們:)。 – Chris