多字節字符串和環視奇怪錯誤

爲什麼下面的代碼對於不同的多字符串行爲不同？多字節字符串和環視奇怪錯誤

echo preg_replace('@(?=\pL)@u', '*', 'م'); // prints: '*م'  ✓ 
echo preg_replace('@(?=\pL)@u', '*', 'ض'); // prints: '*ض'  ✓ 
echo preg_replace('@(?=\pL)@u', '*', 'غ'); // prints: '*�*�' ✗ 
echo preg_replace('@(?=\pL)@u', '*', 'ص'); // prints: '*�*�' ✗

參見：http://3v4l.org/fvab1

來源

2013-02-18 PHPst

它不會識別'غ'字符。恕我直言，它看起來像一個PCRE庫中的錯誤，但作爲PHP，很難說你是否需要啓用某些東西...... – 2013-02-18 17:09:07

這工作正常：echo preg_replace（'/(.+)/'，'* $ 1' ，'غ'）; – 2013-02-18 18:06:25

奇怪的是，它似乎在舊版本中工作：http://3v4l.org/0Pq36 – deceze 2013-02-18 20:15:25

您需要包括修改字母以及（Lm）。請參見下面的腳本遍歷整個阿拉伯語的Unicode塊：

<?php 
function uchar_2($dec) 
{ 
    $utf = chr(192 + (($dec - ($dec % 64))/64)); 
    $utf .= chr(128 + ($dec % 64)); 


    return $utf; 
} 

$issues = 0; 
$count = 0; 
for ($dec = 1536; $dec <= 1791; $dec++) { 
    $char = uchar_2($dec); 
    if (preg_replace('@^(?=\pLm)[email protected]', '*', $char) !== $char) { 
     printf("Issue with %s (%s)\n", $dec, $char); 
     $issues++; 
    } 
    $count++; 
} 

printf("Found %d issues in %d rows\n", $issues, $count);

隨着出Lm，這將一半左右的字符失敗。

來源

2013-02-18 20:22:06

在你的代碼中，即使使用'@ ^（？= \ pL）$ @ u''也沒有問題。但是如果你使用'@（？= \ pL）@u'，它會返回一些問題。在使用'\ pLm'的代碼中顯示所需的輸出。但它也必須和'\ pL'一起工作。 – PHPst 2013-02-19 05:28:48

多字節字符串和環視奇怪錯誤

回答

相關問題