單個密鑰的散列值有多個值

我有兩個文件。一個由一個唯一的列表組成，而另一個是與年齡的名稱冗餘列表。單個密鑰的散列值有多個值

例如

File1:  File2: 
Gaia  Gaia 3 
Matt  Matt 12 
Jane  Gaia 89 
      Reuben 4

我的目標是要匹配文件1和文件2，並以檢索每個名稱的最高年齡。到目前爲止，我已經編寫了下面的代碼。工作不正常的位是：當在散列中找到相同的密鑰時，打印出較大的值。

歡迎任何建議/評論！

謝謝！

#!/usr/bin/perl -w 
use strict; 

open (FILE1, $ARGV[0])|| die "unable to open arg1\n"; #Opens first file for comparison 
open (FILE2, $ARGV[1])|| die "unable to open arg2\n"; #2nd for comparison 

my @not_red = <FILE1>; 
my @exonslength = <FILE2>; 

#2) Produce an Hash of File2. If the key is already in the hash, keep the couple key-   value with the highest value. Otherwise, next. 

my %hash_doc2; 
my @split_exons; 
my $key; 
my $value; 

foreach my $line (@exonslength) { 

    @split_exons = split "\t", $line; 

    @hash_doc2 {$split_exons[0]} = ($split_exons[1]); 

if (exists $hash_doc2{$split_exons[0]}) { 

    if ($hash_doc2{$split_exons[0]} > values %hash_doc2) { 

    $hash_doc2{$split_exons[0]} = ($split_exons[1]); 

    } else {next;} 
     } 
    } 

#3) grep the non redundant list of gene from the hash with the corresponding value 

my @a = grep (@not_red,%hash_doc2); 
print "@a\n";

來源

2012-11-05 Gaia Andreoletti

請使用代碼包裝提交兩個輸入文件的內容 – amphibient

您是否需要保留所有值？如果沒有，則只能保持最大值：

@split_exons = split "\t", $line; 
if (exists $hash_doc2{$slit_exons[0]} 
    and $hash_doc2{$slit_exons[0]} < $split_exons[1]) { 
    $hash_doc2{$split_exons[0]} = $split_exons[1]; 
}

您的代碼並不保留所有值。您不能將數組存儲到散列值中，您必須存儲引用。添加一個新的值到一個數組可以通過push完成：

push @{ $hash_doc2{$split_exons[0]} }, $split_exons[1];

您使用數值比較反對values也沒有做什麼，你的想法。 <運算符強加一個標量上下文，因此values返回值的數量。另一種選擇是將存儲排序的值，並總是要求的最高值：

$hash_doc2{$split_exons[0]} = [ sort @{ $hash_doc2{$split_exons[0]} }, $split_exons[1] ]; 
# max for $x is at $hash_doc2{$x}[-1]

來源

2012-11-05 16:55:46 choroba

，而不是讀整個的文件2到一個數組（這將是壞的，如果它的大），您可以遍歷和逐行處理數據文件：

#!/usr/bin/perl 

use strict; 
use warnings; 
use autodie; 
use Data::Dumper; 

open(my $nameFh, '<', $ARGV[0]); 
open(my $dataFh, '<', $ARGV[1]); 

my $dataHash = {}; 
my $processedHash = {}; 

while(<$dataFh>){ 
    chomp; 
    my ($name, $age) = split /\s+/, $_; 
    if(! defined($dataHash->{$name}) or $dataHash->{$name} < $age){ 
     $dataHash->{$name} = $age 
    } 
} 

while(<$nameFh>){ 
    chomp; 
    $processedHash->{$_} = $dataHash->{$_} if defined $dataHash->{$_}; 
} 

print Dumper($processedHash);

來源

2012-11-05 17:00:16 beresfordt

單個密鑰的散列值有多個值

回答

相關問題