如何通過關鍵

我有以下格式（標籤標示）一些輸入數據進行比較哈希值：如何通過關鍵

（基因條件值）

wnt condition1 1 
wnt condition2 10 
wnt condition3 15 
wnt condition4 -1 
bmp condition1 10 
bmp condition2 inf 
bmp condition3 12 
bmp condition4 -1 
frz condition1 -12 
frz condition2 -6 
frz condition3 -0.3

和AM構建環比如下：

#!/usr/bin/perl 
use warnings; 
use strict; 
use File::Slurp; 
use Data::Dumper; 

my @data = read_file('stack.txt'); 

my %hash; 
foreach (@data){ 
    chomp; 
    my ($gene, $condition, $value) = (/^(\w+)\t(\w+\d)\t(-?\d+|-?inf)/); 
    $hash{$gene}{$condition} = $value; 
}

我想遍歷HoH，對於每個基因，打印出提供的值，該基因的所有值都是正數（例如10）或負數（-3）。在數據上面我只打印出：

frz condition1 -12 
frz condition2 -6 
frz condition3 -0.3

由於兩個其他基因包含值是正反兩方面的條件：

wnt condition1 1 
wnt condition2 10 
wnt condition3 15 
wnt condition4 -1 # discrepancy 

bmp condition1 10 
bmp condition2 inf 
bmp condition3 12 
bmp condition4 -1 # discrepancy

我可以遍歷如下，但我不知道如何使一個環比的價值和該基因的條件，按組合鍵「下一步」值之間的比較：

for my $gene (sort keys %hash) { 
    for my $condition (sort keys %{$hash{$gene}}) { 
     my $value = $hash{$gene}{$condition}; 
     print "$gene\t$condition\t$value\n" if $value =~ m/-/; # This obviously will only print out negative values. I want to compare all values here, and if they are all positive, or all negative, print them.   
    } 
}

讓我知道如果我能澄清這進一步

來源

2013-10-09 fugu

此代碼解決了在哈希每個基因檢查的所有值和遞增$neg如果值包含一個減號，否則$pos問題。如果陽性計數或陰性計數爲零，那麼所有的值都是相同的符號，並且該基因的數據被分類並顯示。

注這個計數inf和0爲陽性，這可能是也可能不是什麼都想。

請注意，使用read_file會浪費，因爲它會一次將整個文件拖入內存中。您可以使用循環並逐行讀取文件，而不是循環訪問數組。與use autodie有沒有必要檢查文件open調用的成功。

use strict; 
use warnings; 
use autodie; 

open my $fh, '<', 'stack.txt'; 

my %data; 

while (<$fh>) { 
    chomp; 
    my ($gene, $condition, $value) = split /\t/; 
    $data{$gene}{$condition} = $value; 
} 

while (my ($gene, $values) = each %data) { 

    my ($pos, $neg) = (0, 0); 

    ++(/-/ ? $neg : $pos) for values %$values; 

    unless ($neg and $pos) { 
    for my $condition (sort keys %$values) { 
     printf "%s\t%s\t%s\n", $gene, $condition, $values->{$condition}; 
    } 
    } 
}

輸出

frz condition1 -12 
frz condition2 -6 
frz condition3 -0.3

來源

2013-10-09 20:35:28 Borodin

謝謝 - 更容易理解和運作良好！ – fugu

我將如何去擴展while循環到HoHoA？ – fugu

@FlyingFrog：這取決於你想要做什麼。你應該問另一個問題。我認爲你有相同基因/條件的多個值？ – Borodin

您可以遍歷一個給定基因的整個值列表，併爲正值和負值增加單獨的計數器，然後比較計數以查看是否存在差異，而不是單獨比較某個值與其鄰居。

假設您的數據相匹配的以下方案：

'bmp' => HASH(0x7324710) 
    'condition1' => 10 
    'condition2' => 'inf' 
    'condition3' => 12 
    'condition4' => '-1' 
'frz' => HASH(0x7323c78) 
    'condition1' => '-12' 
    'condition2' => '-6' 
    'condition3' => '-0.3' 
'wnt' => HASH(0x72a5c30) 
    'condition1' => 1 
    'condition2' => 10 
    'condition3' => 15 
    'condition4' => '-1'

這種替換，對於你的問題的最後一個代碼塊，會給你你需要的結果：

for my $gene (sort keys %hash) { 
    # These variables will contain: 
    # - Counts of positive and negative values 
    my ($pos_vals, $neg_vals) = (0, 0); 
    # - A true/false value indicating whether discrepancy exists 
    my $discrepant = undef; 
    # - A list of the values of all conditions for a given gene 
    my @values =(); 

    # Collect condition values for this gene into @values 
    my @values = values %{ $hash{$gene} }; 

    # For each such value, test for a leading - and increment 
    # the positive or negative value count accordingly 
    for @values { $_ =~ m/^-/ ? $neg_vals++ : $pos_vals++ }; 

    # If neither counter is zero (i.e. both evaluate true), then 
    # a discrepancy exists; otherwise, one doesn't -- either way, 
    # we put the test result in $discrepant so as to produce a 
    # cleaner test in the following if statement 
    $discrepant = (($pos_vals > 0) and ($neg_vals > 0)); 

    # In the absence of a discrepancy... 
    if (not $discrepant) { 
     # iterate over the conditions for this gene and print the gene 
     # name, the condition name, and the value 
     # NB: this is somewhat idiomatic Perl, but you'll tend to see 
     # it from time to time and it's thus worth knowing about 
     print "$gene\t$_\t$hash{$gene}->{$_}\n" 
      foreach sort keys %{ $hash{$gene} }; 
    }; 
}

NB：這將正確處理正負無窮大，但會將零視爲正數，這對您的情況可能不正確。數據中是否出現零值？如果是這樣，他們應該被視爲積極，消極或兩者都不對？

來源

2013-10-09 18:47:13

這是非常好的 - 謝謝！ – fugu

@FlyingFrog很高興爲您服務！ –

這是'map'的濫用，因爲'for @values {$ _ =〜m/^ - /？ $ neg_vals ++：$ pos_vals ++}'會做得很好。另外，包括我在內的很多人會認爲它是對條件表達式的濫用，因爲使用它來修改它自己的參數是不好的做法。 – Borodin

-1

my @data = <$your_file_handle>; 

my %hash; 
foreach (@data){ 
    chomp; 
    my ($gene, $condition, $value) = split; #Sorry, your regex didn't work for me, 
              #hence the change. 
    $hash{$gene}{$condition} = $value; 
} 

for my $gene (sort keys %hash){ 
    my $values = join '', values $hash{$gene}; 
    my $num = %{$hash{$gene}}/1; #Number of conditions 

    #when no '-' is detected or number of '-' matches the one of conditions, print. 
    say $gene if ($values !~ /-/ or $values =~ tr/-/-/ == $num); 
}

來源

2013-10-09 19:56:21

由於您不確定數據是否包含空格，因此不能使用「split」。 – Borodin

此外，* complete data *作爲輸出是必需的，而不僅僅是基因名稱，並且散列中元素的數量通常使用'my $ num = keys％{$ hash {$ gene}}'確定。 – Borodin

1.您可以檢查如何使用'split'並查看我們的哪些用途更加健壯。 2.如果他知道如何'perl'，他應該能夠找出如何產生輸出。這就是爲什麼我只關注「真正的問題」3.你的投票？呵呵...... –

如何通過關鍵

回答

相關問題