用awk

加入兩個文件我有兩個文件，如如下所示的是製表符分隔：用awk

文件中的

chr1 123 aa b c d 
chr1 234 a b c d 
chr1 345 aa b c d 
chr1 456 a b c d 
....

文件B

xxxx abcd chr1 123 aa c d e 
yyyy defg chr1 345 aa e f g 
...

我想加入這兩個文件基於具有「chr1」，「123」和「aa」的3列，並將文件B的前兩列添加到文件A，使得輸出如下所示：輸出：

chr1 123 aa b c d xxxx abcd 
chr1 234 a  b c d 
chr1 345 aa b c d yyyy defg 
chr1 456 a b c d

任何人都可以幫助在awk中做到這一點。如果可能的話使用awk oneliners？

來源

2012-11-06 chas

[你嘗試過什麼到目前爲止（http://whathaveyoutried.com/）？ – doublesharp

這裏使用awk一種方法：

$ awk 'NR==FNR{a[$3,$4]=$1OFS$2;next}{$6=a[$1,$2];print}' OFS='\t' fileb filea 
chr1 123  a b c  xxxx abcd 
chr1 234  a b c 
chr1 345  a b c  yyyy defg 
chr1 456  a b c

說明：

NR==FNR    # current recond num match the file record num i.e in filea 
a[$3,$4]=$1OFS$2 # Create entry in array with fields 3 and 4 as the key 
next    # Grab the next line (don't process the next block) 
$6=a[$1,$2]   # Assign the looked up value to field 6 (+rebuild records) 
print    # Print the current line & the matching entry from fileb ($6) 

OFS='\t'   # Seperate each field with a single TAB on output

編輯：

對於3場的問題，您簡單的添加額外的領域：

$ awk 'NR==FNR{a[$3,$4,$5]=$1OFS$2;next}{$6=a[$1,$2,$3];print}' OFS='\t' fileb filea 
chr1 123 aa  b  c  xxxx  abcd 
chr1 234 a  b  c 
chr1 345 aa  b  c  yyyy  defg 
chr1 456 a  b  c

來源

2013-08-06 20:25:02

我修改了原來的問題。 Coudl爲此提供瞭解決方案。 – chas

您只需在其中添加額外的字段，請參閱編輯。 –

謝謝。 awk'NR == FNR {a [$ 3，$ 4，$ 5] = $ 1OFS $ 2OFS $ 3; next} {$ 6 = a [$ 1，$ 2]; print} 'OFS ='\ t'fileb filea。 – chas

您可以使用join，但流水線變得如此複雜，可能更容易切換到更強大的Perl語言。

join -11 -21 -o1.1,1.2,1.3,1.4,1.5,2.4,2.5 \ 
    <(sed 's/ \+/:/' fileA | sort) \ 
    <(sed 's/ \+/:/' fileB | sort) \ 
| join -11 -22 -a1 -o1.1,1.2,1.3,1.4,1.5,1.6,1.7,2.5,2.6 \ 
    - <(sed 's/ \+\([^ ]\+\) \+\([^ ]\+\)/ \1:\2/' fileC | sort -k2) \ 
| sed 's/:/ /'

Perl的解決方案，使用哈希記住所有的信息：

#!/usr/bin/perl 
use warnings; 
use strict; 

#    key_start key_end keep_from output 
my %files = (A => [0,  1,  2,  [0 .. 3]], 
      B => [0,  1,  2,  [-2, -1]], 
      C => [1,  2,  3,  [-2, -1]], 
      ); 

my %hash; 

for my $file (keys %files) { 
    open my $FH, '<', "file$file" or die "file$file: $!"; 
    while (<$FH>) { 
     my @fields = split; 
     $hash{"@fields[$files{$file}[0], $files{$file}[1]]"}{$file} 
      = [ @fields[$files{$file}[2] .. $#fields] ]; 
    } 
} 

for my $key (sort keys %hash) { 
    print $key, join(' ', q(), 
        grep defined, map { 
         @{ $hash{$key}{$_} }[@{ $files{$_}[-1] }] 
        } sort keys %files), "\n"; 
}

來源

2012-11-06 20:31:08 choroba

@ user1779730：添加了Perl解決方案。 – choroba

回答

相關問題