所以我一直在解決這個問題一段時間了。在使用Perl正則表達式捕獲存儲字符串時遇到問題?
我與百個FASTA序列文件安排是這樣的:
> GI | 192567 | GB | AAA37417.1 |囊性纖維化跨膜傳導調節[小家鼠] MQKSPLEKASFISKLFFSWTTPILRKGYRHHLELSDIYQAPSADSADHLSEKLEREWDREQASKKNPQLIHALRRCFFWRFLFYGILLYLGEVTKAVQPVLLGRIIASYDPENKVERSIAIYLGIGLCLLFIVRTLLLHPAIFGLHRIGMQMRTAMFSLIYKKTLKLSSRVLDKISIGQLVSLLSNNLNKFDEGLALAHFIWIAPLQVTLLMGLLWDLLQFSAFCGLGLLIILVIFQAILGKMMVKYRDQRAAKINERLVITSEIIDNIYSVKAYCWESAMEKMIENLREVELKMTRKAAYMRFFTSSAFFFSGFFVVFLSVLPYTVINGIVLRKIFTTISFCIVLRMSVTRQFPTAVQIWYDSFGMIRKIQDFLQKQEYKVLEYNLMTTGIIMENVTAFWEEGFGELLQKAQQSNGDRKHSSDENNVSFSHLCLVGNPVLKNINLNIEKGEMLAITGSTGLGKTSLLMLILGELEASEGIIKHSGRVSFCSQFSWIMPGTIKENIIFGVSYDEYRYKSVVKACQLQQDITKFAEQDNTVLGEGGVTLSGGQRARISLARAVYKDADLYLLDSPFGYLDVFTEEQVFESCVCKLMANKTRILVTSKMEHLRKADKILILHQGTSYFYGTFSELQSLRPSFSSKLMGYDTFDQFTEERRSSILTETLRRFSVDDSSAPWSKPKQSFRQTGEVGEKRKNSILNSFSSVRKISIVQKTPLCIDGESDDLQEKRLSLVPDSEQGEAALPRSNMIATGPTFPGRRRQSVLDLMTFTPNSGSSNLQRTRTSIRKISLVPQISLNEVDVYSRRLSQDSTLNITEEINEEDLKECFLDDVIKIPPVTTWNTYLRYFTLHKGLLLVLIWCVLVFLVEVAASLFVLWLLKNNPVNSGNNGTKISNSSYVVI ITSTSFYYIFYIYVGVADTLLALSLFRGLPLVHTLITASKILHRKMLHSILHAPMSTISKLKAGGILNRFSKDIAILDDFLPLTIFDFIQLVFIVIGAIIVVSALQPYIFLATVPGLVVFILLRAYFLHTAQQLKQLESEGRSPIFTHLVTSLKGLWTLRAFRRQTYFETLFHKALNLHTANWFMYLATLRWFQMRIDMIFVLFFIVVTFISILTTGEGEGTAGIILTLAMNIMSTLQWAVNSSIDTDSLMRSVSRVFKFIDIQTEESMYTQIIKELPREGSSDVLVIKNEHVKKSDIWPSGGEMVVKDLTVKYMDDGNAVLENISFSISPGQRVGLLGRTGSGKSTLLSAFLRMLNIKGDIEIDGVSWNSVTLQEWRKAFGVITQKVFIFSGTFRQNLDPNGKWKDEEIWKVADEVGLKSVIEQFPGQLNFTLVDGGYVLSHGHKQLMCLARSVLSKAKIILLDEPSAHLDPITYQVIRRVLKQAFAGCTVILCEHRIEAMLDCQRFLVIEESNVWQYDSLQALLSEKSIFQQAISSSEKMRFFQGRHSSKHKPRTQITALKEETEEEVQETRL
我寫,打開該文件的子程序,並在同一時間讀取每個序列中的一個。對於每個序列,我希望在開頭添加gi編號,將大寫字母的長序列作爲字符串添加到不斷增長的數組中。但是,我無法寫出正則表達式來存儲這些值。這是我目前的子程序,我調整了,看看我其實是存儲GI號:
sub getFASTA {
my ($filename) = @_;
my @FASTA_arr;
$/ = "\n\n";
open (my $fh, '<', $filename) or
die ("Could not open file: $filename");
while (<$fh>) {
chomp $_;
$_ =~ /^>gi|(\d*?)|/s;
say "$1";
}
close $fh;
#say join(" ", @FASTA_arr);
}
但是,試圖運行這將返回:
Use of uninitialized value $1 in string at sequenceAlignment.pl line 30, <$fh> chunk 1.
這將返回每個序列,所以總共100次。
那麼,什麼是錯的想法?我幾乎可以肯定,這是一個正則表達式的問題,因爲當我將它改爲「$ _ =〜/(> gi |)/ s;」時,它正常工作,只需要100「> gi |」s打印出來。
你需要在正則表達式中逃避管道:'$ _ =〜/ ^> gi \ |(\ d *?)\ |/s' –