2012-11-06 51 views
0

我有這樣的輸出:如何使用foreach控制結構來標準化perl中的結果?

10dvex2_miRNA_ce.out.data|6361 
10dvex2_miRNA_ce.out.data|6361 
10dvex2_misc_RNA_ce.out.data|0 
10dvex2_rRNA_ce.out.data|239 

這個腳本在Perl:

#!/usr/bin/perl 

use warnings; 
use strict; 

open(MYINPUTFILE, $ARGV[0]); # open for input 
my @lines = <MYINPUTFILE>; # read file into list 
my $count = 0; 
print "Frag"."\t"."ncRNA"."\t"."Amount"."\n"; 

foreach my $lines (@lines){ 
my $pattern = $lines; 
$pattern =~ s/(.*)dvex\d_(.*)_(.*).(out.data)\|(.*)/$1 $2 $3 $5/g; 
$count += $5; 
print $1."\t".$2.$3."\t".$5."\n"; 
} 
close(MYINPUTFILE); 
exit; 

我提取這樣的信息:

Frag ncRNA Amount 
10 miRNAce 6361 
10 misc_RNAce 0 
10 rRNAce 239 

但在金額欄我要舉報這些數字除以總結果(6600)。在這種情況下,我想這樣的輸出:

Frag ncRNA Amount 
10 miRNAce 0.964 
10 misc_RNAce 0 
10 rRNAce 0.036 

我的問題是提取總的結果在環...正常化這些數據。一些想法?

+0

['exit'](http://perldoc.perl.org/functions/exit.html)應該只能由專家使用。 –

回答

1

也許以下將是有用的:從數據集合

use strict; 
use warnings; 

my (%hash, $total, %seen, @array); 

while (<>) { 
    next if $seen{$_}++; 
    /(\d+).+?_([^.]+).+\|(\d+)$/; 
    $hash{$1}{$2} = $3; 
    $total += $3; 
} 

print "Frag\tncRNA\tAmount\n"; 

while (my ($key1, $val1) = each %hash) { 
    while (my ($key2, $val2) = each %$val1) { 
     my $frac = $val2/$total == 0 ? 0 : sprintf('%.3f', $val2/$total); 
     push @array, "$key1\t$key2\t$frac\n"; 
    } 
} 

print map { $_->[0] } 
    sort { $b->[1] <=> $a->[1] } 
    map { [ $_, (split)[2] ] } 
    @array; 

輸出:

Frag ncRNA Amount 
10 miRNA_ce 0.964 
10 rRNA_ce 0.036 
10 misc_RNA_ce 0 

相同的行被跳過,然後所需要的元件從每行捕獲。運行總數保留用於後續計算。您希望的輸出顯示從高到低排序,這就是爲什麼每個記錄是push編輯到@array。但是,如果不需要排序,則可以打印該行並省略@array上的Schwartzian transform

希望這會有所幫助!

+0

偉大的解決方案!此外,我可以控制顯着的數字...非常感謝你! –

1

要做到這一點,您需要兩遍數據。

#! /usr/bin/env perl 

use warnings; 
use strict; 

print join("\t",qw'Frag ncRNA Amount'),"\n"; 

my @data; 
my $total = 0; 

# parse the lines 
while(<>){ 
    my @elem = /(.+?)(?>dvex)\d_(.+)_([^._]+)[.]out[.]data[|](d+)/; 
    next unless @elem; 

    # running total 
    $total += $elem[-1]; 

    # combine $2 and $3 
    splice @elem, 1, 2, $2.$3; # $elem[1].$elem[2]; 

    push @data, \@elem; 
} 

# print them 
for(@data){ 
    my @copy = @$_; 
    $copy[-1] = $copy[-1]/$total; 
    $copy[-1] = sprintf('%.3f', $copy[-1]) if $copy[-1]; 
    print join("\t",@copy),"\n"; 
}