使用Perl的SEEK跳轉到文件中的一行並繼續讀取文件

我的目標是打開一個文件，該文件包含固定長度的單個列（在我的Mac上爲1個字符= 2個字節），然後讀取將文件的行轉換爲數組，從指定的點開始和結束。該文件非常長，所以我使用seek命令跳轉到文件的相應起始行。該文件是一個染色體序列，排列成一列。我成功地跳到了文件中的適當位置，但是我無法將序列讀入數組。使用Perl的SEEK跳轉到文件中的一行並繼續讀取文件

my @seq =(); # to contain the stretch of sequence I am seeking to retrieve from file. 
my $from_bytes = 2*$from - 2; # specifies the "start point" in terms of bytes. 
seek(SEQUENCE, $from_bytes, 0); 
my $from_base = <SEQUENCE>; 
push (@seq, $from_base); # script is going to the correct line and retrieving correct base. 

my $count = $from + 1; # here I am trying to continue the read into @seq 
while (<SEQUENCE>) { 
     if ($count = $to) { # $to specifies the line at which to stop 
       last; 
     } 

     else { 
      push(@seq, $_); 
      $count++; 
      next; 
     } 
} 
print "seq is: @seq\n\n"; # script prints only the first base

來源

2014-01-10 ES55

'if（$ count = $ to）'是賦值。使用'if（$ count == $ to）'代替，或者更好，'if（$ count> = $ to）' – ThisSuitIsBlackNot

看來你正在讀固定寬度的記錄，由$到行組成，每行有2個字節（1個字符+1個換行符）。因此，您可以簡單地使用單個read來讀取每個染色體序列。一個簡短的例子：

use strict; 
use warnings; 
use autodie; 

my $record_number = $ARGV[0]; 
my $lines_per_record = 4; # change to the correct value 
my $record_length = $lines_per_record * 2; 
my $offset   = $record_length * $record_number; 

my $fasta_test = "fasta_test.txt"; 

if (open my $SEQUENCE, '<', $fasta_test) { 
    my $sequence_string; 
    seek $SEQUENCE, $offset, 0; 
    my $chars_read = read($SEQUENCE, $sequence_string, $record_length); 
    if ($chars_read) { 
     my @seq = split /\n/, $sequence_string; # if you want it as an array 
     $sequence_string =~ s/\n//g; # if you want the chromosome sequence as a single string without newlines 
     print $sequence_string, "\n"; 
    } else { 
     print STDERR "Failed to read record $record_number!\n"; 
    } 

    close $SEQUENCE; 
}

隨着更多的信息，可能會提供更好的解決方案。

來源

2014-01-11 02:01:31

使用Perl的SEEK跳轉到文件中的一行並繼續讀取文件

回答

相關問題