在哈希中匹配值

我有兩個散列數組。我想根據第一個變量縮小第二個範圍。在哈希中匹配值

第一陣列包含具有鍵seqname，source，feature，start，end，score，strand，frame，geneID和transcriptID散列。

第二陣列包含具有鍵 organism，geneID，number，motifnumber，position，strand和sequence散列。

我想要做的，是通過哈希，所有這些具有可變geneID這是不任何第二陣列的散列發現的散列的第一個數組中刪除。 - 注意兩種散列都有geneID密鑰。簡而言之，我想將這些散列保留在第一個數組中，它們的geneID值在第二個數組的散列中找到。

我在此嘗試到目前爲止是有兩個循環：

my @subset # define a new array for the wanted hashes to go into. 

for my $i (0 .. $#first_hash_array){ # Begin loop to go through the hashes of the first array. 

    for my $j (0 .. $#second_hash_array){ # Begin loop through the hashes of the 2nd array. 

     if ($second_hash_array[$j]{geneID} =~ m/$first_hash_array[$i]{geneID}/) 
     { 
      push @subset, $second_hash_array[$j]; 
     } 

    } 

}

但是我不知道這是去了解這個正確的方式。

來源

2013-04-15 Ward9250

對於初學者，$a =~ /$b/不檢查是否相等。你需要

$second_hash_array[$j]{geneID} =~ m/^\Q$first_hash_array[$i]{geneID}\E\z/

或者乾脆

$second_hash_array[$j]{geneID} eq $first_hash_array[$i]{geneID}

了點。

其次，

for my $i (0 .. $#first_hash_array) { 
    ... $first_hash_array[$i] ... 
}

可以寫得更簡潔的

for my $first (@first_hash_array) { 
    ... $first ... 
}

下就行了是

for my $second (@second_hash_array) { 
    if (...) { 
     push @subset, $second; 
    } 
}

可以多次添加$second至@subset。你要麼需要添加last

# Perform the push if the condition is true for any element. 
for my $second (@second_hash_array) { 
    if (...) { 
     push @subset, $second; 
     last; 
    } 
}

或移動push圈外

# Perform the push if the condition is true for all elements. 
my $flag = 1; 
for my $second (@second_hash_array) { 
    if (!...) { 
     $flag = 0; 
     last; 
    } 
} 

if ($flag) { 
    push @subset, $second; 
}

取決於你想要做什麼的。

要從陣列中刪除，可以使用splice。但是從數組中移除會混淆所有索引，所以最好將數組向後迭代（從最後一個索引到第一個索引）。

它不僅複雜，而且價格昂貴。每次拼接時，陣列中的所有後續元素都需要移動。

更好的方法是過濾元素並將結果元素分配給數組。

my @new_first_hash_array; 
for my $first (@first_hash_array) { 
    my $found = 0; 
    for my $second (@second_hash_array) { 
     if ($first->{geneID} eq $second->{geneID}) { 
     $found = 1; 
     last; 
     } 
    } 

    if ($found) { 
     push @new_first_hash_array, $first; 
    } 
} 

@first_hash_array = @new_first_hash_array;

通過迭代反覆@second_hash_array是不必要昂貴。

my %geneIDs_to_keep; 
for (@second_hash_array) { 
    ++$geneIDs_to_keep{ $_->{geneID} }; 
} 

my @new_first_hash_array; 
for (@first_hash_array) { 
    if ($geneIDs_to_keep{ $_->{geneID} }) { 
     push @new_first_hash_array, $_; 
    } 
} 

@first_hash_array = @new_first_hash_array;

最後，我們可以替換for有grep給下面的簡單而有效的答案：

my %geneIDs_to_keep; 
++$geneIDs_to_keep{ $_->{geneID} } for @second_hash_array; 

@first_hash_array = grep $geneIDs_to_keep{ $_->{geneID} }, @first_hash_array;

來源

2013-04-15 18:00:39 ikegami

感謝回答，我不能肯定，但我認爲其實這個刪除我想要什麼，並保持我想要去什麼勒特。如果我想只保留first_hash_array中的哈希值，並使用與其他second_hash_array匹配的geneID，則不應該如此：'my％geneIDs_to_keep; ++ $ geneIDs_to_keep {$ _-> {geneID}} for @second_hash_array;'爲了得到我想保留的ID，然後像'my @ new_array = grep $ geneIDs_to_keep {$ _-> {geneID}}，@ first_hash_array;'？ – Ward9250

另外，如果你有時間，你能擴展最後的代碼塊嗎？對於我的新手理解，我可以看到前兩行創建了一個散列，其中所有的geneID都將被刪除/保留，方法是遍歷數組中的每個散列並從每個散列獲取geneID，使用循環和默認變量。對我來說最後一行更難以理解。我在這裏查看grep頁面：http://perldoc.perl.org/functions/grep.html。給定一個簡單的例子'@foo = grep {！/ ^＃/} @bar;'這是'！geneIDs_to_delete {$ _-> {geneID}}'我很難解釋。 – Ward9250

再讀一遍，最後一行迭代遍歷，'@ first_hash_array'設置'$ _'，所以例如'$ _-> {geneID}'部分變成'ID_002'，如果那是'geneID' 'first_hash_array'的元素，然後剩下的變成'grep！$ geneIDs_to_delete {ID_002}'，用於測試ID_002是否在要刪除的基因列表中？ – Ward9250

這是我會怎麼做。

爲需要的geneID創建一個數組req_geneID並將第二個散列的所有geneId放入其中。

遍歷第一散列並檢查geneId包含在req_geneID陣列。 （其紅寶石容易使用「包括哪些內容？」，但你可以嘗試this在Perl）

，並

最後刪除亙古不變的匹配任何geneID哈希在Perl中使用this req_geneID

for (keys %hash) 
{ 
    delete $hash{$_}; 
}

希望這有助於.. :)

來源

2013-04-15 18:00:59 BabbarTushar

在哈希中匹配值

回答

相關問題