2014-11-21 56 views

我匹配字符串中的多個模式來填充一個數組。輸入文件看起來是這樣的:當這個字符串與一個句子的一部分匹配時從數組中刪除字符串 - Perl

I love cat [chats;chaton;chatterie] and rabbit [lapins;lapereau] # J'aime les chats et les lapins # 2.8 
My father [père;parent;papa] lives in New-York # Mon père vit à New-York  # 1.8 


use strict; 
use warnings; 
use Data::Dump; 

open(TEXT, "<", "$ARGV[0]") 
    or die "cannot open < $ARGV[0]: $!"; 

while(my $text = <TEXT>) 
    my @lines = split /\n/, $text; 

    foreach my $line (@lines) { 
     if ($line =~ /(^(.+)\t(.+)\t(.+)$)/){ 
      my $english_sentence = $2; 
      my $french_sentence = $3; 
      my $score = $4; 

      print $english_sentence."#".$french_sentence.""; 

      my @data = map [ split /;/ ], $line =~/\[ ([^\[\]]+) \] /xg; 
      dd \@data; 
     print "\n"; 
close TEXT; 


I love cat [chats;chaton;chatterie] and rabbit [lapins;lapereau] # J'aime les chats et les lapins 
Array==>[["chats", "chaton", "chatterie"], ["lapins", "lapereau"]] 

My father [père;parent;papa] lives in New-York # Mon père vit à New-York 
Array==>[["père", "parent", "papa"]] 


I love cat [chats;chaton;chatterie] and rabbit [lapins;lapereau] # J'aime les chats et les lapins 
[["chats"], ["lapins"]] 

My father [père;parent;papa] lives in New-York # Mon père vit à New-York 

回覆「我需要刪除的字符串數組中,當此字符串匹配的句子的一部分。」,你的輸出似乎表明您反其道而行? – ikegami 2014-11-21 21:08:19


1.對於每個數組,創建一個散列,其中的鍵是數組值。 (散列元素的值無關緊要。)2.將句子拆分爲單詞。 3.對於每個單詞,對於每個散列,從散列中刪除單詞。 4.對於每個哈希,從哈希的關鍵字創建一個數組。 – ikegami 2014-11-21 21:12:12




use utf8; 
use strict; 
use warnings; 
use 5.010; 
use autodie; 

use open qw/ :std :encoding(UTF-8) /; 

use Data::Dump; 

open my $fh, '<', 'sentences.txt'; 

while (<$fh>) { 

    my @sentences = split /\s*#\s*/; 
    next unless @sentences == 3; 

    print join(' # ', @sentences[0,1]), "\n"; 

    my @data = map [ split /;/ ], $sentences[0] =~/\[ ([^\[\]]+) \] /xg; 
    $_ = [ grep { $sentences[1] =~ /\b\Q$_\E\b/ } @$_ ] for @data; 

    dd \@data; 
    print "\n"; 


I love cat [chats;chaton;chatterie] and rabbit [lapins;lapereau] # J'aime les chats et les lapins 
[["chats"], ["lapins"]] 

My father [père;parent;papa] lives in New-York # Mon père vit à New-York 



use utf8; 
use strict; 
use warnings; 
use 5.010; 
use autodie; 

use open qw/ :std :utf8 /; 

open my $fh, '<', 'sentences.txt'; 

while (<$fh>) { 

    my @sentences = split /\s*#\s*/; 
    next unless @sentences == 3; 

    print join(' # ', @sentences[0,1]), "\n"; 

    $sentences[0] =~ s{ \[ ([^\[\]]+) \] }{ 
    my @words = split /;/, $1; 
    @words = grep { $sentences[1] =~ /\b\Q$_\E\b/ } @words; 
    sprintf "[%s]", join ';', @words; 

    print join(' # ', @sentences[0,1]), "\n\n"; 



I love cat [chats;chaton;chatterie] and rabbit [lapins;lapereau] # J'aime les chats et les lapins 
I love cat [chats] and rabbit [lapins] # J'aime les chats et les lapins 

My father [père;parent;papa] lives in New-York # Mon père vit à New-York 
My father [père] lives in New-York # Mon père vit à New-York 

它運作良好。你認爲我可以直接輸出這個輸出嗎?我的父親住在紐約紐約# – 2014-11-22 12:48:12


@ChesterMcAllister:我已經加入了我的解決方案。如果你想爲自己做出這些改變,那將是一個更加鼓舞人心的舉動。與您可以期待自定義響應的論壇不同,Stack Overflow會將您的解決方案視爲最不重要的讀者。 – Borodin 2014-11-22 17:13:52



use strict; 
use warnings; 

while (<DATA>) { 
    my ($English, $French, $repl, %FrWords); 
    if (($English, $French) = m/^([^#]*)\#([^#]*)\#/) { 
     @FrWords{ split /\h+/, $French } = undef; 
     $English =~ s{ \[ ([^\[\]]*) \] }{ 
       $repl = join(';', grep { exists $FrWords{$_} } split /;/, $1); 
       '['. (length($repl) ? $repl : '') .']'; 
     print $English, '#', $French, "\n"; 

I love cat [chats;chaton;chatterie] and rabbit [lapins;lapereau] # J'aime les chats et les lapins # 2.8 
My father [père;parent;papa] lives in New-York # Mon père vit à New-York  # 1.8 


I love cat [chats] and rabbit [lapins] # J'aime les chats et les lapins 
My father [père] lives in New-York # Mon père vit à New-York  

它適用於我的示例數據,但在我的完整文件中,我可以將一個單詞對應2個或更多單詞。例如:'young ==> plus jeune' – 2014-11-24 11:56:01


實際上,代碼的確如下:'Younger [plus; jeune] father [père; parent; papa]#plus jeunepère#1.8 ==> Younger [plus; jeune] father [pΦre]#加jeunepΦre'。你的問題的真實性在於確定單詞開始和結束的位置。除非你掌握了自然語言,否則你很難獲得成功。 – sln 2014-11-24 17:07:57
