2012-05-03 119 views
0

我有一個兩個製表符分隔的文件,我需要對齊在一起。例如:合併兩個類似列的文件

File 1:  File 2: 
AAA 123  BBB 345 
BBB 345  CCC 333 
CCC 333  DDD 444 

(這是較大的文件,可能上千行!)

我想什麼做的是有輸出是這樣的:

AAA 123 
BBB 345 BBB 345 
CCC 333 CCC 333 
     DDD 444 

最好我想在perl中這樣做,但不知道如何。任何幫助將大大appreaciated。

+0

看一看HTTP ://stackoverflow.com/questions/4960275/how-can-match-records-in-two-files-using-perl –

+0

你真的需要重複行標籤每次?建立一個arrayrefs的散列會很容易。 –

回答

0

假設文件的排序,

sub get { 
    my ($fh) = @_; 
    my $line = <$fh>; 
    return() if !defined($line); 
    return split(' ', $line); 
} 

my ($key1, $val1) = get($fh1); 
my ($key2, $val2) = get($fh2); 

while (defined($key1) && defined($key2)) { 
    if ($key1 lt $key2) { 
     print(join("\t", $key1, $val1), "\n"); 
     ($key1, $val1) = get($fh1); 
    } 
    elsif ($key1 gt $key2) { 
     print(join("\t", '', '', $key2, $val2), "\n"); 
     ($key2, $val2) = get($fh2); 
    } 
    else { 
     print(join("\t", $key1, $val1, $key2, $val2), "\n"); 
     ($key1, $val1) = get($fh1); 
     ($key2, $val2) = get($fh2); 
    } 
} 

while (defined($key1)) { 
    print(join("\t", $key1, $val1), "\n"); 
    ($key1, $val1) = get($fh1); 
} 

while (defined($key2)) { 
    print(join("\t", '', '', $key1, $val1), "\n"); 
    ($key2, $val2) = get($fh2); 
} 
0

如池上所提到的,假定文件的內容被佈置爲如圖中的例子。

use strict; 
use warnings; 

open my $file1, '<file1.txt' or die $!; 
open my $file2, '<file2.txt' or die $!; 

my $file1_line = <$file1>; 
print $file1_line; 

while (my $file2_line = <$file2>) { 
    if(defined($file1_line = <$file1>)) { 
     chomp $file1_line; 
     print $file1_line; 
    } 

    my $tabs = $file1_line ? "\t" : "\t\t"; 
    print "$tabs$file2_line"; 
} 

close $file1; 
close $file2; 

回顧你的榜樣,你看在這兩個文件中有一些相同的鍵/值對。鑑於此,看起來您想要顯示文件1特有的對(對於文件2是唯一的),並顯示常用對。如果是這種情況(和你不是試圖通過任一鍵或值相匹配的文件對),你可以use List::Compare:

use strict; 
use warnings; 
use List::Compare; 

open my $file1, '<file1.txt' or die $!; 
my @file1 = <$file1>; 
close $file1; 

open my $file2, '<file2.txt' or die $!; 
my @file2 = <$file2>; 
close $file2; 

my $lc = List::Compare->new(\@file1, \@file2); 

my @file1Only = $lc->get_Lonly; # L(eft array)only 
for(@file1Only) { print } 

my @bothFiles = $lc->get_intersection; 
for(@bothFiles) { chomp; print "$_\t$_\n" } 

my @file2Only = $lc->get_Ronly; # R(ight array)only 
for(@file2Only) { print "\t\t$_" } 
1

如果它只是指一種數據結構,這可能是很容易的。

#!/usr/bin/env perl 

# usage: script.pl file1 file2 ... 

use strict; 
use warnings; 

my %data; 
while (<>) { 
    chomp; 
    my ($key, $value) = split; 
    push @{$data{$key}}, $value; 
} 

use Data::Dumper; 
print Dumper \%data; 

然後,您可以輸出任何你喜歡的格式。如果它確實是按照原樣使用這些文件,那麼它會更棘手一些。

0

喬爾伯傑的回答相似,但這種方法允許你跟蹤文件是否做了或沒包含給定鍵:

my %data; 

while (my $line = <>){ 
    chomp $line; 
    my ($k)   = $line =~ /^(\S+)/; 
    $data{$k}{line} = $line; 
    $data{$k}{$ARGV} = 1; 
} 

use Data::Dumper; 
print Dumper(\%data); 

輸出:

$VAR1 = { 
    'CCC' => { 
    'other.dat' => 1, 
    'data.dat' => 1, 
    'line' => 'CCC 333' 
    }, 
    'BBB' => { 
    'other.dat' => 1, 
    'data.dat' => 1, 
    'line' => 'BBB 345' 
    }, 
    'DDD' => { 
    'other.dat' => 1, 
    'line' => 'DDD 444' 
    }, 
    'AAA' => { 
    'data.dat' => 1, 
    'line' => 'AAA 123' 
    } 
};