找到2個文本文件之間的第二列不匹配

-1

我有這2個文本文件，我想找到任何不匹配的文件之間的第二列。不確定性是基於F ,P and N的類型，而不管它們出現在哪一行。第一個文件中有1F，3P，第二個文件中有2P，1N和1F。兩種文件在進行比較時，應該具有1F，3P和1N類型的相同值。找到2個文本文件之間的第二列不匹配

文本1：

f0x11 F 
f0x34 P 
drx99 
dex67 P 
edx43 P 
sdx33

文本2：

預期輸出：

Text 1 has missing type of N 
Text 2 has missing type of P

我至今嘗試不會產生所需的輸出。

代碼：

use strict; 
my %ref_data; 
my %ref_data2; 
open my $fh, '<', 'Text1' or die "Could not open file to read:$!"; 
while (<$fh>) { 
    chomp; 
    my ($res, $type) = split; 
    if (defined $type){ 
      $ref_data{$type} = "$type"; 
      }   
} 
our ($data,$data2); 
open $fh, '<', 'Text2' or die "Could not open file to read:$!"; 
while (<$fh>) { 
    chomp; 
my ($res, $type) = split; 
    if (defined $type){ 
       $ref_data2{$type}= "$type"; 
       $data2= $ref_data2{$type}; 
       $data = $ref_data{$type}; 
       print "File 2 has missing type of $type\n" unless $data; 
     } 
    } 
foreach ($data){ 
print "File 1 has missing type of $_\n" if $data ne $data2; 
}

來源

2013-12-21 annel

請不要告訴我。提前致謝。 – annel

我沒有看到你如何從你的示例輸入中獲得預期的輸出。你的兩個輸入文件都有一個F，只有不同的行：爲什麼你的輸出說只有其中一個有一個「缺少類型的F」？ –

@llmari這是錯字錯誤。 – annel

我已經重構你的代碼，你似乎在重複同樣的行爲。

輸出不是規範，但應該足夠清楚，讓你理解並完成自己。

我加了一個close $fh;和use warnings;以及

#!/usr/bin/perl 

use strict; 
use warnings; 

#run 
my %max; # hash of combined data 
my $file_data_1 = parse_file_into_hash("text1", \%max); 
my $file_data_2 = parse_file_into_hash("text2", \%max); 
diff_hashes(\%max, $file_data_1, $file_data_2); 

# diff_hashes($max, $h1, $h2) 
# 
# diffs 2 hash refs against a combined $max hash and prints results 
sub diff_hashes { 
    my ($max, $h1, $h2) = @_; 

    # TODO - do all the comparisios and some error checking (if keys exist etc...) here 
    for my $key (keys %$max) { 
     print "max/combined: $key = $max->{$key}\n"; 

     my $h1_print = exists $h1->{$key} ? $h1->{$key} : "0"; 
     my $h2_print = exists $h2->{$key} ? $h2->{$key} : "0"; 

     print "h1: $key = $h1_print\n"; 
     print "h2: $key = $h2_print\n"; 
    } 
} 

# parse_file_into_hash($file, $max) 
# 
# $max is a hash reference (passed by reference) so you can count occurences over 
# multiple files... 
# returns reference of hash ($line_number => $data_value) 
sub parse_file_into_hash { 
    my ($file, $max) = @_; 
    my %ref_data; 

    open my $fh, '<', $file or die "Could not open file to read:$!"; 
    while (my $line = <$fh>) { 
     chomp $line; 
     my ($res, $type) = split /\s+/, $line; 

     if ($type) { 
      $ref_data{$type}++; 

      if (!exists $max->{$type} || $ref_data{$type} > $max->{$type}) { 
       $max->{$type} = $ref_data{$type}; 
      } 
     } 
    } 
    close $fh; 

    return \%ref_data; 
}

輸出撞上了您的示例文件：

$ ./example.pl 
max/combined: F = 1 
h1: F = 1 
h2: F = 1 
max/combined: N = 1 
h1: N = 0 
h2: N = 1 
max/combined: P = 3 
h1: P = 3 
h2: P = 2

來源

2013-12-21 05:44:47 chrsblck

感謝您的快速響應。爲了說清楚，我想檢查文本1的第2列中的類型是否與另一個文件中的類型相同。我在第一個文件中有'1F，3P'，而在第二個文件中有'2P，1N和1F'。比較時，無論它們的順序或行如何，這兩個文件都應該具有相同數量的類型「1F，3P和1N」。因此，消息「文件1缺少N，P類型」和「文件2缺少P類型」。 – annel

@annel - 我改變了我的例子，做你想做的事。我沒有按照你想要的方式完成輸出。你應該可以自己做這個部分。這對我來說似乎是一項家庭作業...... – chrsblck

你似乎想保留2有多少倍的值列發生軌道在每個文件中 - 例如，在你寫的評論中，「我有1F，3P在第一個文件中，而2P，1N和1Fin在第二個文件中」。如果是這樣的話，你需要更好的數據結構。

具體而言，計算第2列中值的出現次數，並且需要爲每個文件分別跟蹤這些計數。這表明哈希散列。

use strict; 
use warnings; 

# Example usage: 
# perl YOUR_SCRIPT.pl a.txt b.txt 
my @files = @ARGV; 

# Count the values in Column 2, organizing the tallies like this: 
# $tallies{COL_2}{FILE_NAME} = N 
my %tallies; 
while (<>) { 
    my @cols = split; 
    $tallies{$cols[1]}{$ARGV} ++ if @cols > 1; 
} 

# Print discrepancies. 
for my $c2 (keys %tallies) { 
    my @t = map { $tallies{$c2}{$_} || 0 } @files; 
    next if $t[0] == $t[1]; 
    print "$c2: $files[0] has $t[0]; $files[1] has $t[1]\n"; 
}

輸出示例：

N: a.txt has 0; b.txt has 1 
P: a.txt has 3; b.txt has 2

另外值得注意的是：這個代碼沒有明確地打開任何文件，並在程序文件名不是硬編碼。相反，我們將輸入文件名作爲命令行參數傳遞，通過@ARGV獲取這些參數，通過<>處理這些文件中的行，並知道我們目前正在通過$ARGV處理哪個文件。

來源

2013-12-21 19:09:37 FMc

找到2個文本文件之間的第二列不匹配

回答

相關問題