例如,匹配「民族報」在「」國際化」沒有額外的模塊,是否有可能在新的Perl版本(5.14,5.15等)?如何在Perl中匹配字符串與變音符號?
我找到了答案!感謝tchrist
與UCA匹配分辯溶液(日Thnx到https://stackoverflow.com/users/471272/tchrist)。
# found start/end offsets for matched utf-substring (without intersections)
use 5.014;
use strict;
use warnings;
use utf8;
use Unicode::Collate;
binmode STDOUT, ':encoding(UTF-8)';
my $str = "Îñţérñåţîöñåļîžåţîöñ" x 2;
my $look = "Nation";
my $Collator = Unicode::Collate->new(
normalization => undef, level => 1
);
my @match = $Collator->match($str, $look);
if (@match) {
my $found = $match[0];
my $f_len = length($found);
say "match result: $found (length is $f_len)";
my $offset = 0;
while ((my $start = index($str, $found, $offset)) != -1) {
my $end = $start + $f_len;
say sprintf("found at: %s,%s", $start, $end);
$offset = $end + 1;
}
}
錯誤(但工作)從溶液
的代碼魔術段子:
$str = Unicode::Normalize::NFD($str); $str =~ s/\pM//g;
代碼示例:
use 5.014;
use utf8;
use Unicode::Normalize;
binmode STDOUT, ':encoding(UTF-8)';
my $str = "Îñţérñåţîöñåļîžåţîöñ";
my $look = "Nation";
say "before: $str\n";
$str = NFD($str);
# M is short alias for \p{Mark} (http://perldoc.perl.org/perluniprops.html)
$str =~ s/\pM//og; # remove "marks"
say "after: $str";¬
say "is_match: ", $str =~ /$look/i || 0;
+1的毛茸茸的例子。 – Bojangles
我不知道是否有任何直接的支持,但你可以向規範化完全分解,然後用剝離任何字符‘加盟’屬性(ISTR有這樣一個屬性,雖然不知道它叫什麼) – tripleee
googe「perl刪除所有變音符號」看起來很有希望的很多匹配 –