在不同文本文件中查找常用條目

我是Perl新手。我有八個文本文件，每個文件的行數超過五千行。我想編寫一個perl腳本來查找在前五個文件中找到的條目（記錄），但找不到最後三個文件。假設文件是（A，B，C，D，E，F，G，H），所以我想要獲得在A到E中找到的條目，但不在F到H中。在不同文本文件中查找常用條目

有人可以請教如何編寫代碼到這份工作？

來源

2012-06-20 user1467925

如果我理解正確的話，你需要：

請在AE（稱爲列表1）
請在FH項目的另一份清單（表2）
找到所有的項目在1其不在2.

而不是使用兩個列表，你會使用兩個散列。

# Two sets of files to be compared. 
my @Set1 = qw(A B C D E); 
my @Set2 = qw(F G H); 

# Get all the items out of each set into hash references 
my $items_in_set1 = get_items(@Set1); 
my $items_in_set2 = get_items(@Set2); 

my %unique_to_set1; 
for my $item (keys %$items_in_set1) { 
    # If an item in set 1 isn't in set 2, remember it. 
    $unique_to_set1{$item}++ if !$items_in_set2->{$item}; 
} 

# Print them out 
print join "\n", keys %unique_to_set1; 

sub get_items { 
    my @files = @_; 

    my %items; 
    for my $file (@files) { 
     open my $fh, "<", $file or die "Can't open $file: $!"; 
     while(my $item = <$fh>) { 
      chomp $item; 
      $items{$item}++; 
     } 
    } 

    return \%items; 
}

如果它是一個關閉，你可以在shell中完成。

cat A B C D E | sort | uniq > set1 
cat F G H | sort | uniq > set2 
comm -23 set1 set2

cat A B C D E將文件一起塗抹成一個流。這是交給sort，然後uniq刪除重複項（uniq行不通，除非線被排序）。結果存入文件set1。第二組再次完成。然後在兩個設置文件上使用comm進行比較，只顯示set1唯一的行。

來源

2012-06-20 05:32:27 Schwern

+1使用shell實用程序的優秀解決方案。 – tuxuday

在不同文本文件中查找常用條目

回答

相關問題