6
A
回答
13
unpack
將比split
和ord
更高效,因爲它不必讓一幫暫時1字符串:
use utf8;
my $str = '中國c'; # Chinese language of china
my @codepoints = unpack 'U*', $str;
print join(',', @codepoints) . "\n"; # prints 20013,22283,99
快速基準測試顯示它比split+ord
快3倍左右:
use utf8;
use Benchmark 'cmpthese';
my $str = '中國中國中國中國中國中國中國中國中國中國中國中國中國中國c';
cmpthese(0, {
'unpack' => sub { my @codepoints = unpack 'U*', $str; },
'split-map' => sub { my @codepoints = map { ord } split //, $str },
'split-for' => sub { my @cp; for my $c (split(//, $str)) { push @cp, ord($c) } },
'split-for2' => sub { my $cp; for my $c (split(//, $str)) { $cp = ord($c) } },
});
結果:
Rate split-map split-for split-for2 unpack
split-map 85423/s -- -7% -32% -67%
split-for 91950/s 8% -- -27% -64%
split-for2 125550/s 47% 37% -- -51%
unpack 256941/s 201% 179% 105% --
的差異不太明顯與較短的字符串,但unpack
仍然是兩倍以上的速度。 (split-for2
比其他分裂速度更快一點,因爲它不建立碼點的列表。)
3
foreach my $c (split(//, $str))
{
print ord($c), "\n";
}
或壓制成單行:my @chars = map { ord } split //, $str;
Data::Dumper版,這將產生:
$VAR1 = [
20013,
22283,
99
];
3
要讓UTF8在源代碼中承認的,你必須use utf8;
事先:
$ perl
use utf8;
my $str = '中國c'; # Chinese language of china
foreach my $c (split(//, $str))
{
print ord($c), "\n";
}
__END__
20013
22283
99
以上簡潔,
print join ',', map ord, split //, $str;
2
http://www.perl.com/pub/2012/04/perlunicook-standard-preamble.html
#!/usr/bin/env perl
use utf8; # so literals and identifiers can be in UTF-8
use v5.12; # or later to get "unicode_strings" feature
use strict; # quote strings, declare variables
use warnings; # on by default
use warnings qw(FATAL utf8); # fatalize encoding glitches
use open qw(:std :utf8); # undeclared streams in UTF-8
# use charnames qw(:full :short); # unneeded in v5.16
# http://perldoc.perl.org/functions/sprintf.html
# vector flag
# This flag tells Perl to interpret the supplied string as a vector of integers, one for each character in the string.
my $str = '中國c';
printf "%*vd\n", ",", $str;
相關問題
- 1. 將字符串轉換爲UTF8與perl
- 2. 附加轉換UTF8字符串數組
- 3. 將字節[]轉換爲UTF8字符串
- 4. 將字符串轉換爲utf8字節
- 5. 將字符串轉換成數值
- 6. 在Perl中將UTF8字符串轉換爲ASCII
- 7. 將utf8代碼點字符串轉換爲utf8 <U+0161>轉換爲utf8
- 8. 轉換字符串轉換成int()
- 9. 轉換日期/時間字符串轉換成數值
- 10. 如何將字符串轉換爲UTF8?
- 11. 轉換爲UTF8格式的字符串
- 12. 將unicode字符串轉換爲utf8
- 13. Unicode轉換爲UTF8字符串
- 14. 將latin1字符串轉換爲utf8?
- 15. 轉換JSON字符串UTF8爲NSDictionary Swift
- 16. 將字符串轉換爲UTF8
- 17. 轉換UINT8到字符UTF8
- 18. 拆分UTF8字符串轉換成字符
- 19. 轉換數字數組轉換成字符串單元陣列
- 20. 轉換的java字符串轉換成JavaScript字符串
- 21. 字符串轉換成Java
- 22. 轉換時間戳字符串轉換成數在甲骨文
- 23. 轉換數組轉換成字符串在AS3
- 24. 分裂在Perl字符串轉換成數據庫
- 25. 更改列值轉換成字符串
- 26. 將字符串轉換成波長/值
- 27. 哈斯克爾轉換數字的字符串轉換成
- 28. 轉換dateTimePicker的值轉換成字符串
- 29. 字符串數值轉換在c#
- 30. 轉換數字字符串轉換爲字符串
「中國的中國語言」?爲什麼'中國'? – Zaid 2010-08-22 20:19:40
我想它應該讀*中文單詞「中國」*。 – daxim 2010-08-23 09:39:42