2014-01-18 74 views
0

我有這個文件:介紹信息的哈希在Perl

313-9640000-9660000:19634:fwd maker gene 1978 7195 .  +  .  ID=maker-313-9640000-9660000%3A19634%3Afwd-augustus-gene-0.10;Name=maker-313-9640000-9660000%253A19634%253Afwd-augustus-gene-0.10 
313-9640000-9660000:19634:fwd maker mRNA 1978 7195 .  +  .  ID=maker-313-9640000-9660000%3A19634%3Afwd-augustus-gene-0.10-mRNA-1;Name=maker-313-9640000-9660000%253A19634%253Afwd-augustus-gene-0.10-mRNA-1;Parent=maker-313-9640000-9660000%3A19634%3Afwd-augustus-gene-0.10 
313-9640000-9660000:19634:fwd maker exon 1978 2207 0.48 +  .  Parent=maker-313-9640000-9660000%3A19634%3Afwd-augustus-gene-0.10-mRNA-1 
313-9640000-9660000:19634:fwd maker exon 3081 3457 0.48 +  .  Parent=maker-313-9640000-9660000%3A19634%3Afwd-augustus-gene-0.10-mRNA-1 
313-9640000-9660000:19634:fwd maker exon 3535 3700 0.48 +  .  Parent=maker-313-9640000-9660000%3A19634%3Afwd-augustus-gene-0.10-mRNA-1 
313-9640000-9660000:19634:fwd maker exon 4247 4391 0.48 +  .  ID=maker-313-9640000-9660000%3A19634%3Afwd-augustus-gene-0.10-mRNA-1:exon:2;Parent=maker-313-9640000-9660000%3A19634%3Afwd-augustus-gene-0.10-mRNA-1 
313-9640000-9660000:19634:fwd maker exon 6766 7195 0.48 +  .  Parent=maker-313-9640000-9660000%3A19634%3Afwd-augustus-gene-0.10-mRNA-1 
313-9640000-9660000:19634:fwd maker CDS  3267 3457 .  +  0  ID=maker-313-9640000-9660000%3A19634%3Afwd-augustus-gene-0.10-mRNA-1:cds:0;Parent=maker-313-9640000-9660000%3A19634%3Afwd-augustus-gene-0.10-mRNA-1 
313-9640000-9660000:19634:fwd maker CDS  3535 3700 .  +  .  Parent=maker-313-9640000-9660000%3A19634%3Afwd-augustus-gene-0.10-mRNA-1 
313-9640000-9660000:19634:fwd maker CDS  4247 4391 .  +  .  ID=maker-313-9640000-9660000%3A19634%3Afwd-augustus-gene-0.10-mRNA-1:cds:2;Parent=maker-313-9640000-9660000%3A19634%3Afwd-augustus-gene-0.10-mRNA-1 
313-9640000-9660000:19634:fwd maker CDS  6766 7106 .  +  .  ID=maker-313-9640000-9660000%3A19634%3Afwd-augustus-gene-0.10-mRNA-1:cds:3;Parent=maker-313-9640000-9660000%3A19634%3Afwd-augustus-gene-0.10-mRNA-1 

這裏最重要的部分是第3列(基因,表達...)。

所以,我想做一個散列,key = gene,mRNA ... value =整個行。

我已經試過這樣:

%features =(); 
while ($line = <>) { 
    chomp; 
    my @gff_data = split /\t+/; 
    $features{$gff_data[2]} = @gff_data; 
} 
for my $key (sort keys %features) { 
    print "$key = $features{$key}\n"; 
} 

可是不行的......

+0

'$ gff_data [2]'不是唯一的,它應該是使用散列時。 –

+0

那麼,我不能將基因,mRNA ...一起分類? – user2886545

+0

無法將所有這些信息存儲爲「典型」鍵=>值散列,其中鍵和值都是字符串文字。關鍵必須是獨一無二的。您可以將這些信息存儲在像這樣的哈希中:key =>(value1,value2,value3 ... valueN),所以該值是一個列表而不是一個字符串。 –

回答

1

如果你想通過柱3組線,

my %features; 
while (<>) { 
    chomp; 
    my @gff_data = split /\t+/; 
    push @{ $features{$gff_data[2]} }, \@gff_data; 
} 

use Data::Dumper; 
print Dumper \%features; 
1

是否有有沒有嵌入任何字段值的空間?我不假設。

第3列中的值不是唯一的,所以大概你想爲每個不同的密鑰數據行的數組

如果當前記錄在$_那麼第三列的值是(split)[2]

像這樣的事情會做的。

use strict; 
use warnings; 

use Data::Dump; 

open my $fh, '<', 'myfile.txt' or die $!; 

my %data; 
while (<$fh>) { 
    chomp; 
    push @{ $data{(split)[2]} }, $_; 
} 

dd \%data; 

輸出

{ 
    CDS => [ 
      "313-9640000-9660000:19634:fwd maker CDS 3267 3457 . + 0 ID=maker-313-9640000-9660000%3A19634%3Afwd-augustus-gene-0.10-mRNA-1:cds:0;Parent=maker-313-9640000-9660000%3A19634%3Afwd-augustus-gene-0.10-mRNA-1", 
      "313-9640000-9660000:19634:fwd maker CDS 3535 3700 . + . Parent=maker-313-9640000-9660000%3A19634%3Afwd-augustus-gene-0.10-mRNA-1", 
      "313-9640000-9660000:19634:fwd maker CDS 4247 4391 . + . ID=maker-313-9640000-9660000%3A19634%3Afwd-augustus-gene-0.10-mRNA-1:cds:2;Parent=maker-313-9640000-9660000%3A19634%3Afwd-augustus-gene-0.10-mRNA-1", 
      "313-9640000-9660000:19634:fwd maker CDS 6766 7106 . + . ID=maker-313-9640000-9660000%3A19634%3Afwd-augustus-gene-0.10-mRNA-1:cds:3;Parent=maker-313-9640000-9660000%3A19634%3Afwd-augustus-gene-0.10-mRNA-1", 
      ], 
    exon => [ 
      "313-9640000-9660000:19634:fwd maker exon 1978 2207 0.48 + . Parent=maker-313-9640000-9660000%3A19634%3Afwd-augustus-gene-0.10-mRNA-1", 
      "313-9640000-9660000:19634:fwd maker exon 3081 3457 0.48 + . Parent=maker-313-9640000-9660000%3A19634%3Afwd-augustus-gene-0.10-mRNA-1", 
      "313-9640000-9660000:19634:fwd maker exon 3535 3700 0.48 + . Parent=maker-313-9640000-9660000%3A19634%3Afwd-augustus-gene-0.10-mRNA-1", 
      "313-9640000-9660000:19634:fwd maker exon 4247 4391 0.48 + . ID=maker-313-9640000-9660000%3A19634%3Afwd-augustus-gene-0.10-mRNA-1:exon:2;Parent=maker-313-9640000-9660000%3A19634%3Afwd-augustus-gene-0.10-mRNA-1", 
      "313-9640000-9660000:19634:fwd maker exon 6766 7195 0.48 + . Parent=maker-313-9640000-9660000%3A19634%3Afwd-augustus-gene-0.10-mRNA-1", 
      ], 
    gene => [ 
      "313-9640000-9660000:19634:fwd maker gene 1978 7195 . + . ID=maker-313-9640000-9660000%3A19634%3Afwd-augustus-gene-0.10;Name=maker-313-9640000-9660000%253A19634%253Afwd-augustus-gene-0.10", 
      ], 
    mRNA => [ 
      "313-9640000-9660000:19634:fwd maker mRNA 1978 7195 . + . ID=maker-313-9640000-9660000%3A19634%3Afwd-augustus-gene-0.10-mRNA-1;Name=maker-313-9640000-9660000%253A19634%253Afwd-augustus-gene-0.10-mRNA-1;Parent=maker-313-9640000-9660000%3A19634%3Afwd-augustus-gene-0.10", 
      ], 
} 
+0

嘿,你能不能最終學會在投票時寫評論? –

+0

@mpapec:你在跟我說話嗎? – Borodin

+0

它看起來像什麼? –

0

你不能把第三場爲您的散列鍵,作爲輸入(第三場)的名字提到多次,例如,外顯子的5倍,和散列鍵總是唯一的。試試下面的代碼,這也很容易理解。

open (INPUT, 'input.txt') or die "Couldn't open file, $!"; 
my @data = <INPUT>; 
my @finalData; 

foreach(@data) 
{ 
my @newData = split(/\s+/,$_); 
push(@finalData,"$newData[2]","$_") 
} 

for(my $i=0;$i<scalar @finalData;$i++) 
{ 
print "Key => $finalData[$i]\nValue => $finalData[$i+1]\n"; 
$i++; 
}