呵呵。 :-)這與Perl的最新版本中對Unicode的逐步增長的支持以及由URI
module使用的正則表達式\C
(更確切地說,是URI::Escape
)有關。閱讀this thread on perl-unicode from 2010 (Don't use the \C escape in regexes - Why not?)瞭解背景。
爲什麼URI
模塊?因爲它被用來做HTTP::Request::Common
的表單和URL編碼。
同時,這裏有一個腳本,我寫提醒自己這個問題如何棘手的是,尤其是在URI
模塊就是這樣一個經常使用的一個:
use 5.010;
use utf8;
# Perl and URI.pm might behave differently when you encode your script in
# Latin1 and drop the utf8 pragma.
use Encode;
use URI;
use Test::More;
use constant C3A8 => 'text=%C3%A8';
use constant E8 => 'text=%E8';
diag "Perl $^V";
diag "URI.pm $URI::VERSION";
my $chars = 'è';
my $octets = encode 'iso-8859-1', $chars;
my $uri = URI->new('http:');
$uri->query_form(text => $chars);
is $uri->query, C3A8, C3A8;
my @exp;
given ("$^V $URI::VERSION") {
when ('v5.12.3 1.56') { @exp = ( E8, C3A8) }
when ('v5.10.1 1.54') { @exp = (C3A8, C3A8) }
when ('v5.10.1 1.58') { @exp = (C3A8, C3A8) }
default { die 'not tested :-)' }
}
$uri->query_form(text => $octets);
is $uri->query, $exp[0], $exp[0];
utf8::upgrade $octets;
$uri->query_form(text => $octets);
is $uri->query, $exp[1], $exp[1];
done_testing;
所以我得到了什麼(在Windows和Cygwin)是:
C:\Windows\system32 :: perl \Opt\Cygwin\tmp\uri.pl
# Perl v5.12.3
# URI.pm 1.56
ok 1 - text=%C3%A8
ok 2 - text=%E8
ok 3 - text=%C3%A8
1..3
和:
[email protected]: ~/comp > perl /tmp/uri.pl
# Perl v5.10.1
# URI.pm 1.54
ok 1 - text=%C3%A8
ok 2 - text=%C3%A8
ok 3 - text=%C3%A8
1..3
UPDATE
您可以手工製作的請求體:
use utf8;
use Encode;
use LWP::UserAgent;
my $chars = 'ölè';
my $octets = encode('iso-8859-1', $chars);
my $body = 'text=' .
join '',
map { $o = ord $_; $o < 128 ? $_ : sprintf '%%%X', $o }
split //, $octets;
my $uri = 'http://localhost:8080/';
my $req = HTTP::Request->new(POST => $uri, [], $body);
print $req->as_string;
my $ua = LWP::UserAgent->new;
my $rsp = $ua->request($req);
print $rsp->as_string;
您是如何確定請求內容的?網絡嗅探器肯定說'text =%E8':http://i.stack.imgur.com/rM3xS.png – daxim
有趣。我在端口8080上運行'nc',並得到'text =%C3%A8'。規格:MacOS X 10.6,perl v5.10.0,libwww-perl/5.837。 – Alessandro