2014-01-10 43 views
0

我的目標是打開一個文件,該文件包含固定長度的單個列(在我的Mac上爲1個字符= 2個字節),然後讀取將文件的行轉換爲數組,從指定的點開始和結束。該文件非常長,所以我使用seek命令跳轉到文件的相應起始行。該文件是一個染色體序列,排列成一列。我成功地跳到了文件中的適當位置,但是我無法將序列讀入數組。使用Perl的SEEK跳轉到文件中的一行並繼續讀取文件

my @seq =(); # to contain the stretch of sequence I am seeking to retrieve from file. 
my $from_bytes = 2*$from - 2; # specifies the "start point" in terms of bytes. 
seek(SEQUENCE, $from_bytes, 0); 
my $from_base = <SEQUENCE>; 
push (@seq, $from_base); # script is going to the correct line and retrieving correct base. 

my $count = $from + 1; # here I am trying to continue the read into @seq 
while (<SEQUENCE>) { 
     if ($count = $to) { # $to specifies the line at which to stop 
       last; 
     } 

     else { 
      push(@seq, $_); 
      $count++; 
      next; 
     } 
} 
print "seq is: @seq\n\n"; # script prints only the first base 
+7

'if($ count = $ to)'是賦值。使用'if($ count == $ to)'代替,或者更好,'if($ count> = $ to)' – ThisSuitIsBlackNot

回答

1

看來你正在讀固定寬度的記錄,由$到行組成,每行有2個字節(1個字符+1個換行符)。因此,您可以簡單地使用單個read來讀取每個染色體序列。一個簡短的例子:

use strict; 
use warnings; 
use autodie; 

my $record_number = $ARGV[0]; 
my $lines_per_record = 4; # change to the correct value 
my $record_length = $lines_per_record * 2; 
my $offset   = $record_length * $record_number; 

my $fasta_test = "fasta_test.txt"; 

if (open my $SEQUENCE, '<', $fasta_test) { 
    my $sequence_string; 
    seek $SEQUENCE, $offset, 0; 
    my $chars_read = read($SEQUENCE, $sequence_string, $record_length); 
    if ($chars_read) { 
     my @seq = split /\n/, $sequence_string; # if you want it as an array 
     $sequence_string =~ s/\n//g; # if you want the chromosome sequence as a single string without newlines 
     print $sequence_string, "\n"; 
    } else { 
     print STDERR "Failed to read record $record_number!\n"; 
    } 

    close $SEQUENCE; 
} 

隨着更多的信息,可能會提供更好的解決方案。

相關問題