2014-01-06 63 views
0

我基本上想要在兩個文本文件(CSV樣式)之間進行無序差異,在這裏比較前兩列中的字段(我不關心第3列值)。然後,我打印出file1.txt具有的值,但不會出現在file2.txt中,反之則是file2.txt與file1.txt相比。Perl無序文本文件之間的差異

FILE1.TXT:

cat,val 1,43432 
cat,val 2,4342 
dog,value,23 
cat2,value,2222 
hedgehog,input,233 

FILE2.TXT:

cat2,value,312 
cat,val 2,11 
cat,val 3,22 
dog,value,23 
hedgehog,input,2145 
bird,output,9999 

輸出會是這樣的:

file1.txt: 
cat,val 1,43432 

file2.txt: 
cat,val 3,22 
bird,output,9999 

我是新來的Perl等等一些更好,更難以實現的方法目前還不是我所知道的。謝謝你的幫助。

當前代碼:

#!/usr/bin/perl -w 

use Cwd; 
use strict; 
use Data::Dumper; 
use Getopt::Long; 

my $myName = 'MyDiff.pl'; 
my $usage = "$myName is blah blah blah"; 

#retreive the command line options, set up the environment 
use vars qw($file1 $file2); 

#grab the specified values or exit program 
GetOptions("file1=s" => \$file1, 
     "file2=s" => \$file2) 
     or die $usage; 
($file1 and $file2) or die $usage; 

open (FH, "< $file1") or die "Can't open $file1 for read: $!"; 
my @array1 = <FH>; 
close FH or die "Cannot close $file1: $!"; 
open (FH, "< $file2") or die "Can't open $file2 for read: $!"; 
my @array2 = <FH>; 
close FH or die "Cannot close $file2: $!"; 

#...do a sort and match 

回答

2

或許下面會有所幫助:

use strict; 
use warnings; 

my @files = @ARGV; 
pop; 
my %file1 = map { chomp; /(.+),/; $1 => $_ } <>; 

push @ARGV, $files[1]; 
my %file2 = map { chomp; /(.+),/; $1 => $_ } <>; 

print "$files[0]:\n"; 
print $file1{$_}, "\n" for grep !exists $file2{$_}, keys %file1; 

print "\n$files[1]:\n"; 
print $file2{$_}, "\n" for grep !exists $file1{$_}, keys %file2; 

用法:perl script.pl file1.txt file2.txt

輸出你的數據集:

file1.txt: 
cat,val 1,43432 

file2.txt: 
cat,val 3,22 
bird,output,9999 

這爲每個文件構建一個散列。鍵是前兩列,關聯的值是整行。 grep用於過濾共享密鑰。

編輯:對於相對較小的文件,使用上述map來處理文件的行將工作正常。但是,首先創建了所有文件行的列表,然後傳遞給map。在較大的文件上,最好使用while (<>) { ...結構,一次讀取一行。下面的代碼執行此操作 - 生成與上面相同的輸出 - 並使用散列哈希(HoH)。因爲它使用HoH,所以你會注意到一些引用:

use strict; 
use warnings; 

my %hash; 
my @files = @ARGV; 

while (<>) { 
    chomp; 
    $hash{$ARGV}{$1} = $_ if /(.+),/; 
} 

print "$files[0]:\n"; 
print $hash{ $files[0] }{$_}, "\n" 
    for grep !exists $hash{ $files[1] }{$_}, keys %{ $hash{ $files[0] } }; 

print "\n$files[1]:\n"; 
print $hash{ $files[1] }{$_}, "\n" 
    for grep !exists $hash{ $files[0] }{$_}, keys %{ $hash{ $files[1] } }; 
+0

哇,這太棒了!非常簡潔,謝謝。 – user1384831

+0

@ user1384831 - 非常歡迎您。增加了另一個選項。 – Kenosis

4

使用哈希此與第2列的關鍵。 一旦你有了這兩個哈希,你可以迭代和刪除常見條目, 什麼仍然在各自的哈希將是你在找什麼。

初始化,

my %hash1 =(); 
my %hash2 =(); 

讀入第一個文件,參加前兩列,以形成密鑰,並將其保存在哈希值。這假設字段用逗號分隔。你也可以使用CSV模塊。

open(my $fh1, "<", $file1) || die "Can't open $file1: $!"; 
while(my $line = <$fh1>) { 
    chomp $line; 

    # join first two columns for key 
    my $key = join ",", (split ",", $line)[0,1]; 

    # create hash entry for file1 
    $hash1{$key} = $line; 
} 

執行相同的文件2,共創%HASH2

open(my $fh2, "<", $file2) || die "Can't open $file2: $!"; 
while(my $line = <$fh2>) { 
    chomp $line; 

    # join first two columns for key 
    my $key = join ",", (split ",", $line)[0,1]; 

    # create hash entry for file2 
    $hash2{$key} = $line; 
} 

現在走在條目並刪除常見的,

foreach my $key (keys %hash1) { 
    if (exists $hash2{$key}) { 
     # common entry, delete from both hashes 
     delete $hash1{$key}; 
     delete $hash2{$key}; 
    } 
} 

%HASH1將現在這只是線在file1中。

你可以打印出來作爲,

foreach my $key (keys %hash1) { 
    print "$hash1{$key}\n"; 
} 

foreach my $key (keys %hash2) { 
    print "$hash2{$key}\n"; 
} 
0

我覺得上面的概率可以通過來解決所提到的算法中

a)我們可以使用hash上述

B中提到的) 1。排序都與鍵1和鍵2中的文件(使用排序有趣)

迭代通過FILE1

Match the key1 and key2 entry of FILE1 with FILE2 
     If yes then 
     take action by printing common lines it to desired file as required 
     Move to next row in File1 (continue with the loop) 
     If No then 
     Iterate through File2 startign from the POS-FILE2 until match is found 
      Match the key1 and key2 entry of FILE1 with FILE2 
      If yes then 
       take action by printing common lines it to desired file as required 
       setting FILE2-END as true 
       exit from the loop noting the position of FILE2 
      If no then 
       take action by printing unmatched lines to desired file as req. 
       Move to next row in File2 
    If FILE2-END is true 
    Rest of Lines in FILE1 doesnt exist in FILE2 
相關問題