Perl總結數組中值的滑動窗口

我想創建一個由第四列組織的由製表符分隔的數據數組的移動窗口。爲了簡單起見，我替換X無關字段和添加的第一行中看到的標頭：Perl總結數組中值的滑動窗口

ID-Counts X  X  Start X  X  Locations  XXXX 
X-5000  [X] [X]  0  [X] [X]  1   [X...] 
X-26  [X] [X]  1  [X] [X]  1   [X...] 
X-34  [X] [X]  1  [X] [X]  0   [X...] 
X-3  [X] [X]  20  [X] [X]  9   [X...] 
X-200  [X] [X]  30  [X] [X]  0   [X...] 
X-1  [X] [X]  40  [X] [X]  5   [X...]

第一列包含該ID通過一個連字符連接一個數字ID和計數。第四列包含我想用來分組數據的所有起始站點。第七列包含我需要用來規範計數的位置數量。

我想爲每條線加總的總值是通過將ID從ID中分開併除以位置數+ 1來確定的（例如，第一行的值爲2500，第2行的值爲排第三，第三十四名）。然後，我想將這些計數/（位置+ 1）相加，對於在第四列中具有值在彼此的20單位內的每一行，從值0-19開始，然後是1-20,2-21等。例如，窗口0（第0列的四個值）將總計行1-3，窗口1總結行2-4，窗口2總和第4行，等等。

我的理想輸出是兩列：第一列有20單位窗口（0,1,2，...）的開始，第二列有每個窗口的總和（在上面的數據2547 ，47.3等）。

我做了一個perl腳本，它將數據過濾並組織成這種格式，並希望爲20unit窗口中的求和添加代碼。作爲perl新手，我將不勝感激任何幫助和解釋。我熟悉跨列的分割和算術函數，但我完全喪失瞭如何在數組中的移動窗口中完成這些功能。謝謝。

來源

2012-11-21 Adam Whisnant

重要的部分仍然不清楚。你能否試着解釋一下如何「總結一切」？ – memowe

編輯，希望它更有意義。基本上我想在第20列中搜索第4列中的值，將它們各自的計數/ loc + 1值相加，並且對於每20個分組執行此操作（第4列範圍0-19,1-20,2-21 .. ） –

我希望我能很好地理解你的問題。你對這些實現有什麼看法？

解決方案1：每次到達單元窗口（20）時寫入輸出文件。

#Assuming that you have an array of sums (@sums) and name of file ($filename) 
my $window_no = 20; 
my $window_sum = 0; 
my @window_nos =(); 

for (my $i = 1; $i <= $#sums; $i++) { 
    push (@window_nos, $i); 
    if (i % window_no == 0) { 
     write_file($filename, join(',', @window_nos) . "\t" . $window_sum . "\n"); 
     $window_sum = 0; 
     @window_nos =(); 
    } 
} 


if (scalar @window_nos > 1) { 
    write_file($filename, join(',', @window_nos) . "\t" . $window_sum) . "\n"); 
}

解決方案2：將值附加到標量變量並使用它向輸出文件寫入一次。

#Assuming that you have an array of sums (@sums) and name of file ($filename) 
my $window_no  = 20; 
my $window_sum = 0; 
my @window_nos =(); 
my $file_contents = ''; 

for (my $i = 1; $i <= $#sums; $i++) { 
    push (@window_nos, $i); 
    if (i % window_no == 0) {    
     $file_contents .= join(',', @window_nos) . "\t" . $window_sum . "\n"; 
     $window_sum = 0; 
     @window_nos =(); 
    } 
} 

if (scalar @window_nos > 1) { 
    $file_contents .= join(',', @window_nos) . "\t" . $window_sum . "\n"; 
} 

write_file($filename, $file_contents);

來源

2012-11-21 02:45:54 Carlisle18

看看下面的代碼，看看它是否做到了你想要的。可能會有所優化，但我基本上做了一個蠻力搜索所有開始在當前開始之上20單位窗口內。
肯

輸出：

0-19: 2547.000000 
1-20: 47.300000 
20-39: 200.300000 
30-49: 200.166667 
40-59: 0.166667

代碼

use strict; 
use warnings; 

# Hash indexed by Start 
# Each value contains the sum of all (Counts/Locations+1) for 
#  this Start value 
my %sum; 

while (<DATA>) 
{ 
    # ignore comments 
    next if /^\s*#/; 
    my ($id_count,undef,undef,$start,undef,undef,$numLocations) = 
     split ' '; 
    my ($id,$count) = split '-',$id_count; 
    $sum{$start} += $count/($numLocations + 1); 
} 

foreach my $start (sort keys %sum) 
{ 
    my $totalSum = 0; 
    # Could probably be optimized. 
    foreach my $start2 ($start .. $start+19) 
    { 
     $totalSum += $sum{$start2} if defined($sum{$start2});  
    } 
    printf "%d-%d: %f\n", $start, $start+19, $totalSum; 
} 

__DATA__ 
#ID-Counts X  X  Start X  X  Locations  XXXX 
X-5000  [X] [X]  0  [X] [X]  1   [X...] 
X-26  [X] [X]  1  [X] [X]  1   [X...] 
X-34  [X] [X]  1  [X] [X]  0   [X...] 
X-3  [X] [X]  20  [X] [X]  9   [X...] 
X-200  [X] [X]  30  [X] [X]  0   [X...] 
X-1  [X] [X]  40  [X] [X]  5   [X...]

來源

2012-11-22 00:30:30

這段代碼有幾個問題。不知道我從哪裏複製。見下面的答案。 –

這個怎麼樣？

#!/usr/bin/perl -Tw 

use strict; 
use warnings; 
use Data::Dumper; 

my %sum_for; 

while (my $line = <DATA>) { 

    if ($line !~ m{\A [#] }xms) { 

     $line =~ s{\A \s* ([^-]+) - }{$1 }xms; # separate the ID 

     my @columns = split /\s+/, $line; # assumes no space in values 

     my $count = $columns[1]; 
     my $start = $columns[4]; 
     my $locat = $columns[7] + 1; 

     $sum_for{$start} += $count/$locat; 
    } 
} 

print Dumper(\%sum_for); 

my @start_ranges; 
{ 
    my ($max_start) = sort { $b <=> $a } keys %sum_for; 

    # max => range count 
    # 10 => 1 
    # 20 => 2 
    # 30 => 2 
    # 40 => 3 
    # 50 => 3 
    # ... 
    my $range_count = $max_start/20; 

    push @start_ranges, [ 0, 19 ]; 

    for (1 .. $range_count) { 

     push @start_ranges, [ map { $_ + 20 } @{ $start_ranges[-1] } ]; 
    } 
} 

my %total_for; 

for my $range_ra (@start_ranges) { 

    my $range_key = sprintf '%d-%d', @{$range_ra}; 

    for my $start ($range_ra->[0] .. $range_ra->[1]) { 

     if (exists $sum_for{$start}) { 

      $total_for{$range_key} += $sum_for{$start}; 
     } 
    } 
} 

print Dumper(\%total_for); 

__DATA__ 
#ID-Counts X  X  Start X  X  Locations  XXXX 
X-5000  [X] [X]  0  [X] [X]  1   [X...] 
X-26  [X] [X]  1  [X] [X]  1   [X...] 
X-34  [X] [X]  1  [X] [X]  0   [X...] 
X-3  [X] [X]  20  [X] [X]  9   [X...] 
X-200  [X] [X]  30  [X] [X]  0   [X...] 
X-1  [X] [X]  40  [X] [X]  5   [X...]

輸出出來，如：

$VAR1 = { 
      '1' => 47, 
      '40' => '0.166666666666667', 
      '0' => 2500, 
      '30' => 200, 
      '20' => '0.3' 
     }; 
$VAR1 = { 
      '40-59' => '0.166666666666667', 
      '20-39' => '200.3', 
      '0-19' => 2547 
     };

在計算起始範圍位帶着些許玩味。感謝您的有趣問題。

來源

2012-11-22 16:25:27 ddoxey

Perl總結數組中值的滑動窗口

回答

相關問題