檢查截斷HTML實體SUBSTR

如果我有：檢查截斷HTML實體SUBSTR

$output = substr($str, 0, 3);

和$str具有值 'a ABCDE'。 $輸出值的值爲'à& ag'，'& agrave;`被切斷。我希望輸出的值爲'ààb'。我試過mb_substr($str, 0, 3, 'UTF-8')同樣的問題。使用html_entity_decode對$str給了我500個內部服務器錯誤。編輯：我注意到，500錯誤只發生在被截斷的字符串部分是html實體的一部分時。

來源

2013-12-10 Aditya

如果您正在處理編碼的html，則必須將其解碼爲純文本，然後執行您的子字符串，然後重新編碼。字符串函數不能期望處理html字符實體。 –

您需要使用正確的編碼。 $str可能不是utf8。只有你知道編碼。 PHP可以猜測，但不是很確定。

使用html_entity_decode()是要走的路。

，或者你有做自己算：

$str = 'Hello &amp; byeye!'; 

// mb_ shouldn't be necessary because all mb chars are html encoded 
$output = substr($str, 0, 8); 
var_dump($output); 
$cutoff = is_int($pos = strrpos($output, '&')) && strrpos($output, ';') < $pos; 
if ($cutoff) { 
    $output = substr($str, 0, 1+strpos($str, ';', strlen($output))); 
    var_dump($output); 
}

類似的東西。但html_entity_decode()更好，所以請打開error_reporting和display_errors，看看有什麼不對。

來源

2013-12-10 18:50:38 Rudie

如果你想要它返回一個加重的字符，你必須將你的字符串轉換爲真正的UTF-8字符（或任何你喜歡的編碼），而不是à等。 Php將所有這些人當作角色來對待，你無法通過substr將整個à識別爲單個角色。

可以使用

// $str = '&agrave; &agravecde' 
html_entity_decode($str,ENT_COMPAT,'UTF-8'); 
// $str = 'à àcde'; 
$output = substr($str, 0, 3); 
// $output = 'à àc'

我知道你顯然試圖html_entity_decode，但我敢肯定的功能不被破壞。字符串中是否有字符已經以任何不同的編碼進行了翻譯？請回顯html_entity_decode出現問題的字符串？

來源

2013-12-10 18:57:19

檢查截斷HTML實體SUBSTR

回答

相關問題