2013-10-09 108 views
0

我有以下格式(標籤標示)一些輸入數據進行比較哈希值:如何通過關鍵

(基因條件值)

wnt condition1 1 
wnt condition2 10 
wnt condition3 15 
wnt condition4 -1 
bmp condition1 10 
bmp condition2 inf 
bmp condition3 12 
bmp condition4 -1 
frz condition1 -12 
frz condition2 -6 
frz condition3 -0.3 

和AM構建環比如下:

#!/usr/bin/perl 
use warnings; 
use strict; 
use File::Slurp; 
use Data::Dumper; 

my @data = read_file('stack.txt'); 

my %hash; 
foreach (@data){ 
    chomp; 
    my ($gene, $condition, $value) = (/^(\w+)\t(\w+\d)\t(-?\d+|-?inf)/); 
    $hash{$gene}{$condition} = $value; 
} 

我想遍歷HoH,對於每個基因,打印出提供的值,該基因的所有值都是正數(例如10)或負數(-3)。在數據上面我只打印出:

frz condition1 -12 
frz condition2 -6 
frz condition3 -0.3 

由於兩個其他基因包含值是正反兩方面的條件:

wnt condition1 1 
wnt condition2 10 
wnt condition3 15 
wnt condition4 -1 # discrepancy 

bmp condition1 10 
bmp condition2 inf 
bmp condition3 12 
bmp condition4 -1 # discrepancy 

我可以遍歷如下,但我不知道如何使一個環比的價值和該基因的條件,按組合鍵「下一步」值之間的比較:

for my $gene (sort keys %hash) { 
    for my $condition (sort keys %{$hash{$gene}}) { 
     my $value = $hash{$gene}{$condition}; 
     print "$gene\t$condition\t$value\n" if $value =~ m/-/; # This obviously will only print out negative values. I want to compare all values here, and if they are all positive, or all negative, print them.   
    } 
} 

讓我知道如果我能澄清這進一步

回答

1

此代碼解決了在哈希每個基因檢查的所有值和遞增$neg如果值包含一個減號,否則$pos問題。如果陽性計數或陰性計數爲零,那麼所有的值都是相同的符號,並且該基因的數據被分類並顯示。

這個計數inf0爲陽性,這可能是也可能不是什麼都想。

請注意,使用read_file會浪費,因爲它會一次將整個文件拖入內存中。您可以使用循環並逐行讀取文件,而不是循環訪問數組。與use autodie有沒有必要檢查文件open調用的成功。

use strict; 
use warnings; 
use autodie; 

open my $fh, '<', 'stack.txt'; 

my %data; 

while (<$fh>) { 
    chomp; 
    my ($gene, $condition, $value) = split /\t/; 
    $data{$gene}{$condition} = $value; 
} 

while (my ($gene, $values) = each %data) { 

    my ($pos, $neg) = (0, 0); 

    ++(/-/ ? $neg : $pos) for values %$values; 

    unless ($neg and $pos) { 
    for my $condition (sort keys %$values) { 
     printf "%s\t%s\t%s\n", $gene, $condition, $values->{$condition}; 
    } 
    } 
} 

輸出

frz condition1 -12 
frz condition2 -6 
frz condition3 -0.3 
+0

謝謝 - 更容易理解和運作良好! – fugu

+0

我將如何去擴展while循環到HoHoA? – fugu

+0

@FlyingFrog:這取決於你想要做什麼。你應該問另一個問題。我認爲你有相同基因/條件的多個值? – Borodin

1

您可以遍歷一個給定基因的整個值列表,併爲正值和負值增加單獨的計數器,然後比較計數以查看是否存在差異,而不是單獨比較某個值與其鄰居。

假設您的數據相匹配的以下方案:

'bmp' => HASH(0x7324710) 
    'condition1' => 10 
    'condition2' => 'inf' 
    'condition3' => 12 
    'condition4' => '-1' 
'frz' => HASH(0x7323c78) 
    'condition1' => '-12' 
    'condition2' => '-6' 
    'condition3' => '-0.3' 
'wnt' => HASH(0x72a5c30) 
    'condition1' => 1 
    'condition2' => 10 
    'condition3' => 15 
    'condition4' => '-1' 

這種替換,對於你的問題的最後一個代碼塊,會給你你需要的結果:

for my $gene (sort keys %hash) { 
    # These variables will contain: 
    # - Counts of positive and negative values 
    my ($pos_vals, $neg_vals) = (0, 0); 
    # - A true/false value indicating whether discrepancy exists 
    my $discrepant = undef; 
    # - A list of the values of all conditions for a given gene 
    my @values =(); 

    # Collect condition values for this gene into @values 
    my @values = values %{ $hash{$gene} }; 

    # For each such value, test for a leading - and increment 
    # the positive or negative value count accordingly 
    for @values { $_ =~ m/^-/ ? $neg_vals++ : $pos_vals++ }; 

    # If neither counter is zero (i.e. both evaluate true), then 
    # a discrepancy exists; otherwise, one doesn't -- either way, 
    # we put the test result in $discrepant so as to produce a 
    # cleaner test in the following if statement 
    $discrepant = (($pos_vals > 0) and ($neg_vals > 0)); 

    # In the absence of a discrepancy... 
    if (not $discrepant) { 
     # iterate over the conditions for this gene and print the gene 
     # name, the condition name, and the value 
     # NB: this is somewhat idiomatic Perl, but you'll tend to see 
     # it from time to time and it's thus worth knowing about 
     print "$gene\t$_\t$hash{$gene}->{$_}\n" 
      foreach sort keys %{ $hash{$gene} }; 
    }; 
} 

NB:這將正確處理正負無窮大,但會將零視爲正數,這對您的情況可能不正確。數據中是否出現零值?如果是這樣,他們應該被視爲積極,消極或兩者都不對?

+0

這是非常好的 - 謝謝! – fugu

+0

@FlyingFrog很高興爲您服務! –

+0

這是'map'的濫用,因爲'for @values {$ _ =〜m/^ - /? $ neg_vals ++:$ pos_vals ++}'會做得很好。另外,包括我在內的很多人會認爲它是對條件表達式的濫用,因爲使用它來修改它自己的參數是不好的做法。 – Borodin

-1
my @data = <$your_file_handle>; 

my %hash; 
foreach (@data){ 
    chomp; 
    my ($gene, $condition, $value) = split; #Sorry, your regex didn't work for me, 
              #hence the change. 
    $hash{$gene}{$condition} = $value; 
} 

for my $gene (sort keys %hash){ 
    my $values = join '', values $hash{$gene}; 
    my $num = %{$hash{$gene}}/1; #Number of conditions 

    #when no '-' is detected or number of '-' matches the one of conditions, print. 
    say $gene if ($values !~ /-/ or $values =~ tr/-/-/ == $num); 
} 
+0

由於您不確定數據是否包含空格,因此不能使用「split」。 – Borodin

+0

此外,* complete data *作爲輸出是必需的,而不僅僅是基因名稱,並且散列中元素的數量通常使用'my $ num = keys%{$ hash {$ gene}}'確定。 – Borodin

+0

1.您可以檢查如何使用'split'並查看我們的哪些用途更加健壯。 2.如果他知道如何'perl',他應該能夠找出如何產生輸出。這就是爲什麼我只關注「真正的問題」3.你的投票?呵呵...... –