好的,所以基本上我要用這一個把我的頭撞在牆上。試圖從字符串中刪除HTML標籤(+內容)
下面的代碼:
<?php
$s = "385,178<ref name=\"land area\">Data is accessible by following \"Create tables and diagrams\" link on the following site, and then using table 09280 \"Area of land and fresh water (km²) (M)\" for \"The whole country\" in year 2013 and summing up entries \"Land area\" and \"Freshwater\": {{cite web |url=http://www.ssb.no/en/natur-og-miljo/statistikker/arealdekke |title=Area of land and fresh water, 1 January 2013 |publisher=[[Statistics Norway]] |date=28 May 2013 |accessdate=23 November 2013}}</ref>";
function removeHTMLTags($str) {
$r = '/(\\<br\\>)|(\\<br\/\\>)|(\\<(.+?)(\\s*[^\\<]+)?\\>(.+)?\\<\\\\\/\\1\\>)|(\\<ref\\sname=([^\\<]+?)\/\\>)/';
echo "Preg_matching : $str\n\n";
echo "Regex : $r\n\n";
return preg_replace($r,'',$str);
}
echo removeHTMLTags($s);
?>
我試圖做的,基本上是擺脫<ref name="... </ref>
部分(以及所有可能的標籤爲好)。
然而,這就是我得到
(又名一模一樣的字符串,沒有任何被替換):
Preg_matching : 385,178<ref name="land area">Data is accessible by following "Create tables and diagrams" link on the following site, and then using table 09280 "Area of land and fresh water (km²) (M)" for "The whole country" in year 2013 and summing up entries "Land area" and "Freshwater": {{cite web |url=http://www.ssb.no/en/natur-og-miljo/statistikker/arealdekke |title=Area of land and fresh water, 1 January 2013 |publisher=[[Statistics Norway]] |date=28 May 2013 |accessdate=23 November 2013}}</ref>
Regex : /(\<br\>)|(\<br\/\>)|(\<(.+?)(\s*[^\<]+)?\>(.+)?\<\\\/\1\>)|(\<ref\sname=([^\<]+?)\/\>)/
385,178<ref name="land area">Data is accessible by following "Create tables and diagrams" link on the following site, and then using table 09280 "Area of land and fresh water (km²) (M)" for "The whole country" in year 2013 and summing up entries "Land area" and "Freshwater": {{cite web |url=http://www.ssb.no/en/natur-og-miljo/statistikker/arealdekke |title=Area of land and fresh water, 1 January 2013 |publisher=[[Statistics Norway]] |date=28 May 2013 |accessdate=23 November 2013}}</ref>
所以,問題是:什麼我做錯了嗎? (我已經測試了RegExr多次正則表達式,它似乎是工作 - 我會搞亂它與...逃逸?)
附:對於那些你知道我在說什麼的人:是的,這是維基百科信息框的一部分。
應該怎樣最終結果是什麼?另外,你爲什麼不簡單使用'strip_tags()'?這不符合您的要求嗎?如果不是,爲什麼? –
你不應該用正則表達式來播放HTML。那麼'strip_tags()'面臨的問題是什麼? –
@AmalMurali其中沒有任何標籤(+標籤內容)的初始字符串('$ s')。 –