Perl從兩個陣列中找到類似的元素

我想從@amplicon_exon陣列中檢索包含類似元素（，如）到@failedamplicons陣列的元素。 @failedamplicons中的每個元素都是唯一的，只能匹配@amplicon_exon中的單個元素。我已經嘗試了兩個for循環，但獲得重複值。有沒有更好的方法來查找和檢索兩個數組中的相似值？Perl從兩個陣列中找到類似的元素

@failedamplicons: example: 
OCP1_FGFR3_8.87 
OCP1_AR_14.89 

@amplicon_exon: example: 
TEST_Focus_ERBB2_2:22:ERBB2:GENE_ID=ERBB2;PURPOSE=CNV,Hotspot;CNV_ID=ERBB2;CNV_HS=1 
OCP1_FGFR3_8:intron:FGFR3:GENE_ID=FGFR3;PURPOSE=CNV;CNV_ID=FGFR3;CNV_HS=1 
OCP1_CDK6_14:intron:CDK6:GENE_ID=CDK6;PURPOSE=CNV;CNV_ID=CDK6;CNV_HS=1

下面是兩個用於循環代碼：

my $i = 0; 
my $j = 0; 

for ($i = 0; $i < @amplicon_exon; $i++) { 

    for ($j = 0; $j < @failedamplicons; $j++) { 

     my $fail_amp = (split /\./, $failedamplicons[$j])[0]; 

     #print "the failed amp before match is $fail_amp\n"; 

     if (index($amplicon_exon[$i], $fail_amp) != -1) { 

      #print "the amplicon exon that matches $amplicon_exon[$i] and sample is $sample_id\n"; 
      print "the failed amp that matches $fail_amp and sample is $sample_id\n"; 

      my @parts = split /:/, $amplicon_exon[$i]; 
      my $exon_amp = $parts[1]; 

      next unless $parts[3] =~ /Hotspot/; #includes only Hotspot amplicons 
      my $gene_res = $parts[2]; 
      my $depth  = (split /\./, $failedamplicons[$j])[1]; 
      my @total_amps = (
       $run_name, $sample_id, $gene_res, $depth, $fail_amp, $run_date, $matrix_status 
      ); 

      my $lines = join "\t", @total_amps; 

      push(@finallines, $lines); 
     } 
    } 
}

來源

2016-12-21 user3781528

你能提供的是「_similar_」一個精確的標準？ – zdim

amplicon_exon元素必須包含「。」之前的完整字符串failureamplicons元素。例如：'OCP1_FGFR3_8：內含子：FGFR3：GENE_ID = FGFR3;目的= CNV; CNV_ID = FGFR3; CNV_HS = 1'包含'OCP1_FGFR3_8'謝謝 – user3781528

@ user3781528：我已經整理好您的Perl代碼，以便我可以讀取它。請在將來發布易讀的代碼。 – Borodin

split和grep是你的朋友，爲的是遍歷列表中的慣用做法。簡單地遍歷第一個數組，只提取想要匹配的部分（使用split在.字符上拆分元素，然後只取第一個條目），然後使用正則表達式grep從元素開始到:第二陣列：

for my $elem (@failedamplicons){ 
    my $to_match = (split /\./, $elem)[0]; 
    if (my ($matched) = grep {$_ =~ /^\Q$to_match:/} @amplicon_exon){ 
     print "$matched\n"; 
    }   
}

來源

2016-12-21 19:03:27 stevieb

謝謝，這真的很好！ – user3781528

我改變了正則表達式，以便'$ to_match'中的元字符被轉義。有關更多詳情，請參閱http://perldoc.perl.org/perlre.html#Quoting-metacharacters – shawnhcorey

@shawnhcorey感謝您解決我的疏忽！ – stevieb

Perl從兩個陣列中找到類似的元素

回答

相關問題