1
組系列連續數我有一個數據輸入,看起來像這樣:如何在Perl
seq 75 T G -
seq 3185 A R +
seq 3382 A R +
seq 4923 C - + *
seq 4924 C - + *
seq 4925 T - + *
seq 5252 A W +
seq 7400 T C -
seq 16710 C - - #
seq 18248 T C -
seq 18962 C - + *
seq 18963 A - + *
seq 18964 T - + *
seq 18965 A - + *
seq 19566 A M +
The input above is already sorted at 2nd column.
我想要做的是:
- 只有把線,其中第4列是「 - 」。
- 如果這些行包含連續的位置(第2列),它們分組
- 它們表示爲具有最低位置作爲新的位置 和分組字母新字符串的串聯一個新行。
因此,我們希望得到這樣的輸出:
seq 75 T G -
seq 3185 A R +
seq 3382 A R +
seq 4923 CCT - + **
seq 5252 A W +
seq 7400 T C -
seq 16710 C - - #
seq 18248 T C -
seq 18962 CATA - + **
seq 19566 A M +
** Are the new lines/string formed by * line in first list (input)
# line is kept as it is because there is no consecutive position after that.
我堅持下面的邏輯,不知道如何着手:
while (<>) {
chomp;
my @els = split(/\s+/,$_);
# Process indel
my @temp =();
if ($els[3] eq "-" ) {
push @temp, $_;
}
# How can I group them appropriately.
print Dumper \@temp ;
# And print accordingly to input ordering
}