通過特殊字符爆炸·

我有串這樣的： Suède · Slovénie通過特殊字符爆炸·

我需要·爆發，我嘗試了各種解決方案，如：

preg_split("/[?·]/",strip_tags($single->children(2)->outertext)) 

explode(chr(149), strip_tags($single->children(2)->outertext)); 

explode(utf8_encode('·'),strip_tags($single->children(2)->outertext)); 

explode('·',strip_tags($single->children(2)->outertext));

但沒有解決方案對我的作品！任何人都可以讓我知道嗎？

來源

2016-05-30 daniyalahmad

帽子字符集你使用？ Utf-8中的'è'是兩個字節（0xC3,0xA8），'explode（）'，就像其他PHP字符串函數一樣，可以在字節基礎上工作 – johannes

Preg_split空格點空間？我的意思是任何字符中的點 – Andreas

我認爲你可以教''preg_ *'函數正確操作Unicode字符串（我假設UTF-8）使用'u'標誌，所以'preg_split（'/？·]/u'，..）'也可以完成這項工作。但是，Marcin建議使用'mb_split（）'更好，因爲它更具表現力。 –

您應該還是用mb_split()：

var_dump(mb_split('·', 'Suède · Slovénie'));

給

array(2) { 
    [0]=> 
    string(7) "Suède " 
    [1]=> 
    string(10) " Slovénie" 
}

來源

2016-05-30 16:35:41

'·'可能會更好，保存修剪迭代 – strangeqargo

這似乎給定的字符串的工作，但也許不是在所有的字符串。

preg_split("/\b (\W+) \b/", $str);

來源

2016-05-30 16:42:12 Andreas

您的文件最有可能使用Utf-8。在Utf-8中，·由兩個字節（0xC2,0xB7）組成，例如"/[?·]/"等表達式將在這些字節中的任何一個上斷開。相反，你必須使用u修改使用UTF-8模式：

$ php -r 'print_r(preg_split("/[?·]/u", "Suède·Slovénie"));' 
Array 
(
    [0] => Suède 
    [1] => Slovénie 
)

更妙的是使用mb_split()多字節意識到分離功能，但事實並非總是可用。

來源

2016-05-30 16:43:29 johannes

看來你正在使用simplehtmldom，它不是正確編碼的字符，請使用str_get_html如下：

//mb_convert_encoding will try to detect the `$html` encoding and convert it to `UTF-8` 
$html = str_get_html(mb_convert_encoding(file_get_contents("http://somesite.com"), 'auto', 'UTF-8'));

然後，你可以簡單地使用：

explode('·',strip_tags($single->children(2)->outertext));

來源

2016-05-30 16:49:00

我已經找到了解決方案，· =·並且我們需要把這個特性。

explode('&middot;',$str);

來源

2016-05-30 18:10:19 daniyalahmad

通過特殊字符爆炸·

回答

相關問題