2014-11-16 22 views
0

我是Perl的新手,我正在閱讀文件中的文本,並希望用他們的法語翻譯替換一些單詞。我設法逐詞獲取,但不是通過表達式/字符串,我有問題讓它代碼明智。使用正則表達式在文件中搜索字符串Perl

代碼字一個字:

my $filename = 'assign3.txt'; 
my @lexicon_en = ("Winter","Date", "Due Date", "Problem", "Summer","Mark","Fall","Assignment","November"); 
my @lexicon_fr = ("Hiver", "Date", "Date de Remise","Problème","Été", "Point", "Automne", "Devoir", "Novembre"); 
my $i=1; 
open(my $fh, '<:encoding(UTF-8)', $filename) 
    or die "Could not open file $filename !"; 
while (<$fh>) { 
    for my $word (split) 
    { 
     print " $i. $word \n"; 
     $i++; 
     for (my $j=0; $j < 9;$j++){ 
      if ($word eq $lexicon_en[$j]){ 
      print "Found one! - j value is $j\n"; 
      } 
     } 
    } 
} 
print "\ndone here!!\n"; 

這裏是正則表達式我試圖使用方法:

/\w+\s\w+/ 

這是我的字符串代碼:

while (<>) { 
     print ("this is text: $_ \n"); 

     if ((split (/Due\sDate/),$_) eq "Due Date"){ 
      print "yes!!\n"; 
     } 
} 
+0

你可以給你的正是尋找一些示例輸出。以便我可以向您發送示例腳本。 –

回答

1

使用\ b來檢測字邊界而不是\ w來檢測空白。

結合史蒂芬·克拉森的解決方案與 How to replace a set of search/replace pairs?

#!/usr/bin/perl 
use strict; 
use warnings; 

my %lexicon = (
    'Winter' => 'Hiver', 
    'Date' => 'Date', 
    'Due Date' => 'Date de Remise', 
    'Problem' => 'Problème', 
    'Summer' => 'Été', 
    'Mark' => 'Point', 
    'Fall' => 'Automne', 
    'Assignment' => 'Devoir', 
    'November' => 'Novembre', 
); 

# add lowercase 
for (keys %lexicon) { 
    $lexicon{lc($_)} = lc($lexicon{$_}); 
    print $_ . " " . $lexicon{lc($_)} . "\n"; 
} 

# Combine to one big regexp. 
# https://stackoverflow.com/questions/17596917/how-to-replace-a-set-of-search-replace-pairs?answertab=votes#tab-top 
my $regexp = join '|', map { "\\b$_\\b" } keys %lexicon; 

my $sample = 'The due date of the assignment is a date in the fall.'; 
print "sample before: $sample\n"; 
$sample =~ s/($regexp)/$lexicon{$1}/g; 
print "sample after : $sample\n"; 
2

我想我明白你遇到的挑戰。由於「截止日期」是兩個詞,因此您需要它在「到期」匹配之前進行匹配,否則會得到幾個不正確的翻譯。處理這種情況的一種方法是用最大數量的單詞來排列你的比賽,以便在「到期日」之前處理「到期日」。

如果您轉換您的數組哈希(字典),你可以根據空格數順序按鍵,然後遍歷他們做實際的替換:

#!/usr/bin/perl 
use strict; 
use warnings; 

#my @lexicon_en = ("Winter","Date", "Due Date", "Problem", "Summer","Mark","Fall","Assignment","November"); 
#my @lexicon_fr = ("Hiver", "Date", "Date de Remise","Problème","Été", "Point", "Automne", "Devoir", "Novembre"); 

# convert your arrays to a hash 
my %lexicon = (
    'Winter' => 'Hiver', 
    'Date' => 'Date', 
    'Due Date' => 'Date de Remise', 
    'Problem' => 'Problème', 
    'Summer' => 'Été', 
    'Mark' => 'Point', 
    'Fall' => 'Automne', 
    'Assignment' => 'Devoir', 
    'November' => 'Novembre', 
); 

# sort the keys on the number of spaces found 
my @ordered_keys = sort { ($a =~//g) < ($b =~//g) } keys %lexicon; 

my $sample = 'The due date of the assignment is a date in the fall.'; 

print "sample before: $sample\n"; 

foreach my $key (@ordered_keys) { 
    $sample =~ s/${key}/${lexicon{${key}}}/ig; 
} 

print "sample after : $sample\n"; 

輸出:

sample before: The due date of the assignment is a date in the fall. 
sample after : The Date de Remise of the Devoir is a Date in the Automne. 

接下來的挑戰是要確保替換案件與正在替換的內容匹配。

+0

你的代碼非常好,你能解釋更多的案例更換挑戰嗎?謝謝 – user3241846

+1

@ julian-ladisch已經用他的代碼示例來處理它。他添加了對的小寫等價物,以便如果出現「到期日」而不是「到期日」,它將與相同的大小寫替換進行交換。 –

相關問題