字符串匹配的搜索

一個文本文件中像這樣的查詢文件：字符串匹配的搜索

fooLONGcite 
GetmoreDATA 
stringMATCH 
GOODthing

另一個文本文件中像這樣的主題文件：

sometingfooLONGcite 
anyotherfooLONGcite 
matchGetmoreDATA 
GETGOODthing 
brotherGETDATA 
CITEMORETHING 
TOOLONGSTUFFETC

預期的結果將是擺脫主題文件匹配的字符串然後打印出來。所以，輸出應該是：

sometingfooLONGcite 
anyotherfooLONGcite 
matchGetmoreDATA  
GETGOODthing

這是我的Perl腳本。但它不起作用。你能幫我找出問題在哪裏嗎？謝謝。

#!/usr/bin/perl 
use strict; 

# to check the command line option 
if($#ARGV<0){ 
    printf("Usage: \n <tag> <seq> <outfile>\n"); 
    exit 1; 
} 

# to open the given infile file 
open(tag, $ARGV[0]) or die "Cannot open the file $ARGV[0]"; 
open(seq, $ARGV[1]) or die "Cannot open the file $ARGV[1]"; 

my %seqhash =(); 
my $tag_id; 
my $tag_seq; 
my $seq_id; 
my $seq_seq; 
my $seq; 
my $i = 0; 

print "Processing cds seq\n"; 
#check the seq file 
while(<seq>){ 
    my @line = split; 
    if($i != 0){ 
     $seqhash{$seq_seq} = $seq; 
     $seq = ""; 
     print "$seq_seq\n"; 
    } 
    $seq_seq = $line[0]; 
    $i++; 
} 

while(<tag>){ 
    my @tagline = split; 
    $tag_seq = $tagline[0]; 
    $seq = $seqhash{$seq_seq}; 
    #print "$tag_seq\n"; 
    print "$seq\n"; 
    #print output ">$id\n$seq\n"; 
} 
#print "Ending of Processing gff\n"; 

close(tag); 
close(seq);

來源

2012-02-01 Jianguo

[什麼都有你試過？]（http://mattgemmell.com/2008/12/08/what-have-you-tried/） – 2012-02-01 21:24:24

我加了我的腳本。 – Jianguo 2012-02-01 21:28:26

據我所知，您尋找的字符串的一部分，而不是一個確切的匹配。這裏有一個腳本，可以做我認爲你正在尋找的東西：

script.pl的內容。我考慮到查詢的文件很小，因爲我添加的所有內容的正則表達式：

use warnings; 
use strict; 

## Check arguments. 
die qq[Usage: perl $0 <query_file> <subject_file>\n] unless @ARGV == 2; 

## Open input files. Abort if found errors. 
open my $fh_query, qq[<], shift @ARGV or die qq[Cannot open input file: $!\n]; 
open my $fh_subject, qq[<], shift @ARGV or die qq[Cannot open input file: $!\n]; 

## Variable to save a regex with alternations of the content of the 'query' file. 
my $query_regex; 

{ 
    ## Read content of the 'query' file in slurp mode. 
    local $/ = undef; 
    my $query_content = <$fh_query>; 

    ## Remove trailing spaces and generate a regex. 
    $query_content =~ s/\s+\Z//; 
    $query_content =~ s/\n/|/g; 
    $query_regex = qr/(?i:($query_content))/; 
} 

## Read 'subject' file and for each line compare if that line matches with 
## any word of the 'query' file and print in success. 
while (<$fh_subject>) { 
    if (m/$query_regex/o) { 
     print 
    } 
}

運行腳本：

perl script.pl query.txt subject.txt

而且結果：

sometingfooLONGcite 
anyotherfooLONGcite 
matchGetmoreDATA 
GETGOODthing

來源

2012-02-01 22:07:53 Birei

它工作正常。但是如果我使用另一個文件，它將不起作用。你能幫我解決它嗎？謝謝。這裏是新數據的鏈接：http：//stackoverflow.com/questions/9101082/extract-sequence-information-using-tag-sequence – Jianguo 2012-02-01 23:04:51

您目前的代碼沒有多大意義;你甚至可以引用你不指定任何東西的變量。

您只需將第一個文件讀入散列，然後檢查第二行中的每一行。

while (my $line = <FILE>) 
{ 
    chomp($line); 
    $hash{$line} = 1; 
} 

... 

while (my $line = <FILE2>) 
{ 
    chomp($line); 
    if (defined $hash{$line}) 
    { 
     print "$line\n"; 
    } 
}

來源

2012-02-01 21:34:56

我跑了這段代碼，爲什麼沒有發生？非常感謝。 – Jianguo 2012-02-01 21:45:05

::感嘆::因爲它只是你需要做的一個例子。 – 2012-02-01 21:49:34

你能幫我完成這段代碼嗎？請。我在perl上很新。非常感謝您的幫助。 – Jianguo 2012-02-01 21:53:05

字符串匹配的搜索

回答

相關問題