試圖從字符串中刪除HTML標籤（+內容）

好的，所以基本上我要用這一個把我的頭撞在牆上。試圖從字符串中刪除HTML標籤（+內容）

下面的代碼：

<?php 

$s = "385,178<ref name=\"land area\">Data is accessible by following \"Create tables and diagrams\" link on the following site, and then using table 09280 \"Area of land and fresh water (kmÂ²) (M)\" for \"The whole country\" in year 2013 and summing up entries \"Land area\" and \"Freshwater\": {{cite web |url=http://www.ssb.no/en/natur-og-miljo/statistikker/arealdekke |title=Area of land and fresh water, 1 January 2013 |publisher=[[Statistics Norway]] |date=28 May 2013 |accessdate=23 November 2013}}</ref>"; 

function removeHTMLTags($str) { 
    $r = '/(\\<br\\>)|(\\<br\/\\>)|(\\<(.+?)(\\s*[^\\<]+)?\\>(.+)?\\<\\\\\/\\1\\>)|(\\<ref\\sname=([^\\<]+?)\/\\>)/'; 

    echo "Preg_matching : $str\n\n"; 
    echo "Regex : $r\n\n"; 

    return preg_replace($r,'',$str); 
} 

echo removeHTMLTags($s); 

?>

我試圖做的，基本上是擺脫<ref name="... </ref>部分（以及所有可能的標籤爲好）。

然而，這就是我得到

（又名一模一樣的字符串，沒有任何被替換）：

Preg_matching : 385,178<ref name="land area">Data is accessible by following "Create tables and diagrams" link on the following site, and then using table 09280 "Area of land and fresh water (kmÂ²) (M)" for "The whole country" in year 2013 and summing up entries "Land area" and "Freshwater": {{cite web |url=http://www.ssb.no/en/natur-og-miljo/statistikker/arealdekke |title=Area of land and fresh water, 1 January 2013 |publisher=[[Statistics Norway]] |date=28 May 2013 |accessdate=23 November 2013}}</ref> 

Regex : /(\<br\>)|(\<br\/\>)|(\<(.+?)(\s*[^\<]+)?\>(.+)?\<\\\/\1\>)|(\<ref\sname=([^\<]+?)\/\>)/ 

385,178<ref name="land area">Data is accessible by following "Create tables and diagrams" link on the following site, and then using table 09280 "Area of land and fresh water (kmÂ²) (M)" for "The whole country" in year 2013 and summing up entries "Land area" and "Freshwater": {{cite web |url=http://www.ssb.no/en/natur-og-miljo/statistikker/arealdekke |title=Area of land and fresh water, 1 January 2013 |publisher=[[Statistics Norway]] |date=28 May 2013 |accessdate=23 November 2013}}</ref>

所以，問題是：什麼我做錯了嗎？（我已經測試了RegExr多次正則表達式，它似乎是工作 - 我會搞亂它與...逃逸？）

附：對於那些你知道我在說什麼的人：是的，這是維基百科信息框的一部分。

來源

2014-01-27 Dr.Kameleon

應該怎樣最終結果是什麼？另外，你爲什麼不簡單使用'strip_tags（）'？這不符合您的要求嗎？如果不是，爲什麼？ –

你不應該用正則表達式來播放HTML。那麼'strip_tags（）'面臨的問題是什麼？ –

@AmalMurali其中沒有任何標籤（+標籤內容）的初始字符串（'$ s'）。 –

你真的應該使用這種東西的DOM，因爲其他解決方案往往容易破裂：

$dom = new DOMDOcument(); 
$errorState = libxml_use_internal_errors(true); 
$dom->loadHTML($s); 

$xpath = new DOMXPath($dom); 
$node = $xpath->query('//body/p/text()')->item(0); 
echo $node->textContent; 

libxml_use_internal_errors($errorState);

來源

2014-01-27 12:28:56 PeeHaa

我決定接受*您的*答案是正確的，因爲它看起來效果更好。但是，仍然存在問題。請看看這裏：http://stackoverflow.com/questions/21796147/remove-all-html-tagscontent-from-text –

試圖從字符串中刪除HTML標籤（+內容）

回答

相關問題