0
我有一個文件有多個字符串;說data.fa.搜索主題打印標題
sp|P08246|ELNE_HUMAN Neutrophil elastase OS=Homo sapiens GN=ELANE PE=1 SV=1
MTLGRRLACLFLACVLPALLLGGTALASEIVGGRRARPHAWPFMVSLQLRGGHFCGATLI
APNFVMSAAHCVANVNVRAVRVVLGAHNLSRREPTRQVFAVQRIFENGYDPVNLLNDIVI
LQLNGSATINANVQVAQLPAQGRRLGNGVQCLAMGWGLLGRNRGIASVLQELNVTVVTSL
CRRSNVCTLVRGRQAGVCFGDSGSPLVCNGLIHGIASFVRGGCASGLYPDAFAPVAQFVN
WIDSIIQRSEDNPCPHPRDPDPASRTHGGGGNGVQCLAMGWG
sp|P31689|DNJA1_HUMAN DnaJ homolog subfamily A member 1 OS=Homo sapiens GN=DNAJA1 PE=1 SV=2
MVKETTYYDVLGVKPNATQEELKKAYRKLALKYHPDKNPNEGEKFKQISQAYEVLSDAKK
RELYDKGGEQAIKEGGAGGGFGSPMDIFDMFFGGGGRMQRERRGKNVVHQLSVTLEDLYN
GATRKLALQKNVICDKCEGRGGKKGAVECCPNCRGTGMQIRIHQIGPGMVQQIQSVCMEC
QGHGERISPKDRCKSCNGRKIVREKKILEVHIDKGMKDGQKITFHGEGDQEPGLEPGDII
sp|P10144|GRAB_HUMAN Granzyme B OS=Homo sapiens GN=GZMB PE=1 SV=2
MQPILLLLAFLLLPRADAGEIIGGHEAKPHSRPYMAYLMIWDQKSLKRCGGFLIRDDFVL
TAAHCWGSSINVTLGAHNIKEQEPTQQFIPVKRPIPHPAYNPKNFSNDIMLLQLERKAKR
TRAVQPLRLPSNKAQVKPGQTCSVAGWGQTAPLGKHSHTLQEVKMTVQEDRKCESDLRHY
YDSTIELCVGDPEIKKTSFKGDSGGPLVCNKVAQGIVSYGRNNGMPPRACTKVSSFVHWI
KKTMKRYGNGVQCLAMGWG
我想打印頭和沒有圖案(GNGVQCLAMGWG)如果對任何一個輸出文件。 是啊!這裏是一個新手。我有以下代碼
#!/usr/bin/perl
use strict;
use warnings;
print STDOUT "Enter the motif: ";
my $motif = <STDIN>;
chomp $motif;
my %seqs = %{ read_fasta_as_hash('data.fa') };
foreach my $id (keys %seqs) {
if ($seqs{$id} =~ /$motif/) {
print $id, "\n";
print $seqs{$id}, "\n";
}
}
sub read_fasta_as_hash {
my $fn = shift;
my $current_id = '';
my %seqs;
open FILE, "<$fn" or die $!;
while (my $line = <FILE>) {
chomp $line;
if ($line =~ /^(>.*)$/) {
$current_id = $1;
} elsif ($line !~ /^\s*$/) { # skip blank lines
$seqs{$current_id} .= $line
}
}
close FILE or die $!;
return \%seqs;
}
我期待輸出如下:
sp|P08246|ELNE_HUMAN Neutrophil elastase OS=Homo sapiens GN=ELANE PE=1 SV=1: 02
sp|P10144|GRAB_HUMAN Granzyme B OS=Homo sapiens GN=GZMB PE=1 SV=2: 01
我需要幫助。
不是真的,但最優選的部分是輸出順序,即 – user3489854
不是,但最優選的部分是輸出順序,即> FASTA_header1:圖案的數量 – user3489854
某些功能很難適合我腦。你可以請輸入實際的腳本讓我運行。謝謝拉特。 – user3489854