我試圖用空格替換一些已解碼的字符(例如\ x {2013},\ u {38}等)。以下是我使用的正則表達式。但是我收到了Wide character
錯誤,或者某些字符在打印信息中仍然沒有正確解碼。我認爲這不符合表達,我嘗試了不同的方式。我想要所有那些帶有空格或 - 的解碼字符。請在下面找到我的非工作代碼:關於未知格式的Perl正則表達式
use strict;
use warnings;
my $sai = qq(Asdf \\u2013abc<br />jkl-abcd<br /><div>!\\"\\u00A3$%^&*()-_ =+</div><div>{</div><div>}</div><div>[</div><div>]</div><div>: ; @ \' # ~*,,</div><div>? > < . ,/| \\\\ ` /* - + . </div><div> </div><div> 12345</div><div> </div><ul><li><span obj=\\"venit-rte-obj-026f68485\\">\\u00FC<span obj=\\"venit-rte-obj-0196185f4\\"> </span></span>Abcd</li><ul><li><span obj=\\"venit-rte-obj-026f68485\\">v<span obj=\\"venit-rte-obj-0196185f4\\"> </span></span>Abcd</li><li><span obj=\\"venit-rte-obj-026f68485\\">v<span obj=\\"venit-rte-obj-0196185f4\\"> </span></span>Jkl</li><ul><li><span obj=\\"venit-rte-obj-0a7a49fef\\">\\u00B7<span obj=\\"venit-rte-obj-0196185f4\\"> </span></span>Asdf</li></ul><li><span obj=\\"venit-rte-obj-026f68485\\">\\u00A7<span obj=\\"venit-rte-obj-0196185f4\\"> </span></span>test</li></ul></ul><div> </div><div> </div><div><ul><li><span obj=\\"venit-rte-obj-026f68485\\">\\u00D8<span obj=\\"venit-rte-obj-0196185f4\\"> </span></span>Dfgst</li><li><span obj=\\"venit-rte-obj-026f68485\\">\\u00D8<span obj=\\"venit-rte-obj-0196185f4\\"> </span></span>Sdrgdg</li><ul><li><span obj=\\"venit-rte-obj-0a7a49fef\\">\\u00B7<span obj=\\"venit-rte-obj-0196185f4\\"> </span></span>Abcd</li></ul></ul>Testing \\u2013 code</div> \x{2013};\x{2013}abcjkl-abcd!\"\x{a3} \$%^&*()-_=+{}[]: ;\@ ' # ~*,,? > AbcdTesting \x{2013} code67\x{fc} Abcdv Abcdv Jkl\x{b7} Asdfs\x{a7} test \x{d8} Dfgst\x{d8} Sdrgdg\x{b7});
for ($sai)
{
s/[^\p{ASCII}]//g;
s/\\u[0-9]+/-/g;
s/\\x[a-z0-9]/-/g;
}
print $sai;
現在只有x {} D8和等沒有消失
添加['使用UTF-8;'(https://ideone.com/Wc9XRz )。 –