此代碼是否可用?我真的不知道我應該使用哪種規範化形式(我注意到的唯一情況是NFD
我得到了錯誤的輸出)。Unicode-ready wordsearch - 問題
#!/usr/local/bin/perl
use warnings;
use 5.014;
use utf8;
binmode STDOUT, ':encoding(utf-8)';
use Unicode::Normalize;
use Unicode::Collate::Locale;
use Unicode::GCString;
my $text = "my taxt täxt";
my %hash;
while ($text =~ m/(\p{Alphabetic}+(?:'\p{Alphabetic}+)?)/g) { #'
my $word = $1;
my $NFC_word = NFC($word);
$hash{$NFC_word}++;
}
my $collator = Unicode::Collate::Locale->new(locale => 'DE');
for my $word ($collator->sort(keys %hash)) {
my $gcword = Unicode::GCString->new($word);
printf "%-10.10s : %5d\n", $gcword, $hash{$word};
}
只要您對比較的所有字符串使用_same_ one,那麼使用_which_標準化並不重要! –
@Kerrek這是不正確的。 Unicode :: Collate(及其子類U :: C :: Locale)和Unicode :: GCString都是專門設計的,因此規範化**無關緊要。 – tchrist