2012-01-31 48 views
3

我有一個看起來像這樣的數據:相對記錄分隔符在Perl

id:40108689 -- 
chr22_scrambled_bysegments:10762459:F : chr22:17852459:F (1.0), 
id:40108116 -- 
chr22_scrambled_bysegments:25375481:F : chr22_scrambled_bysegments:25375481:F (1.0), 
chr22_scrambled_bysegments:25375481:F : chr22:19380919:F (1.0), 
id:1 -- 
chr22:21133765:F : chr22:21133765:F (0.0), 

所以每個記錄由id:[somenumber] --

分開什麼是訪問數據,使我們可以有一個哈希的方式array:

$VAR = { 'id:40108689' => [' chr22_scrambled_bysegments:10762459:F : chr22:17852459:F (1.0),'], 

     'id:40108116' => ['chr22_scrambled_bysegments:25375481:F :chr22_scrambled_bysegments:25375481:F (1.0)', 
'chr22_scrambled_bysegments:25375481:F : chr22:19380919:F (1.0),' 
     #...etc 
     } 

我試着用記錄分隔符來處理這個問題。但不知道如何推廣它?

{ 
    local $/ = " --\n"; # How to include variable content id:[number] ? 

    while ($content = <INFILE>) { 
     chomp $content; 
     print "$content\n" if $content; # Skip empty records 
    } 
} 

回答

6
my $result = {}; 
my $last_id; 
while (my $line = <INFILE>) { 
    if ($line =~ /(id:\d+) --/) { 
     $last_id = $1; 
     next; 
    } 
    next unless $last_id; # Just in case the file doesn't start with an id line 

    push @{ $result->{$last_id} }, $line; 
} 

use Data::Dumper; 
print Dumper $result; 

採用正常記錄分隔符。

使用$ last_id跟蹤遇到的最後一個id行,並在遇到另一個id時將其設置爲下一個id。將non-id行推送到數組中,作爲最後匹配的id行的散列鍵。

+0

謝謝。但是,我認爲你需要這樣一個小修正:'if($ line!〜/ id:\ d + - /)push @ {$ result - > {$ last_id}},$ line; } ' – neversaint 2012-01-31 05:27:49

+1

哎呀,趕上!更正的代碼示例。 – 2012-01-31 05:32:58