2012-03-25 69 views
0

兩件事:PHP:從文本中刪除特定域的所有超鏈接

  1. 刪除指向mydomain.com &所有超鏈接保留不屬於這個域的所有其他超鏈接。

  2. 對於剩下的所有其他URL,獲取標籤之間的值並將其顯示爲ID。

1.關於第一個任務:

我有這樣的:

$str = 'I have been searching <a href="http://www.google.com">Google</a> for all the valuable information. I have also tried <a href="http://www.yahoo.com">Yahoo</a> and I finally, ended up finding it at 
<font size="1">My Site <a style="color:#0000ff;font-family:Arial,Helvetica,sans-serif" href="http://www.mydomain.com/go.php?offer=fine&amp;pid=10" target="_blank" >My Link</a></font>. So you can visit <a href="http://www.mydomain.com/go.php?offer=ok" target="_blank">My Link</a>'; 

我想這樣:

$str = 'I have been searching <a href="http://www.google.com">Google</a> for all the valuable information. I have also tried <a href="http://www.yahoo.com">Yahoo</a> and I finally, ended up finding it at . So you can visit '; 

我試過了:

我試過下面的preg_replace,但是它刪除了所有的鏈接。我只是想要它從mydomain.com中刪除所有鏈接,並保留其他所有內容。

$pattern = "/<a[^>]*>(.*)<\/a>/iU"; 
$final_str = preg_replace($pattern, "$1", $str); 

2.關於第二個任務:

最後,我想這個落得:

$str = 'I have been searching <a href="http://www.google.com" id="Google">Google</a> for all the valuable information. I have also tried <a href="http://www.yahoo.com" id="Yahoo">Yahoo</a> and I finally, ended up finding it at . So you can visit '; 
+0

回答這兩個問題:http://php.net/manual/en/class.domdocument.php – PeeHaa 2012-03-25 00:34:01

+2

不要嘗試使用正則表達式解析HTML。你會(/你)失敗(ing)。 – PeeHaa 2012-03-25 00:34:51

+0

強制性參考:http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags – 2012-03-25 00:52:20

回答

1

這應該做的伎倆在2個步驟:

<? 

$str = 'I have been searching <a href="http://www.google.com">Google</a> for all the valuable information. I have also tried <a href="http://www.yahoo.com">Yahoo</a> and I finally, ended up finding it at <font size="1">My Site <a style="color:#0000ff;font-family:Arial,Helvetica,sans-serif" href="http://www.mydomain.com/go.php?offer=fine&amp;pid=10" target="_blank" >My Link</a></font>. So you can visit <a href="http://www.mydomain.com/go.php?offer=ok" target="_blank">My Link</a>'; 

// removing the domain links 
$pattern1 = '|<a [^>]*href="http://www.mydomain.com[^"]*"[^>]*>.*</a>|iU'; 
$str = preg_replace($pattern1, '', $str); 

// adding IDs 
$pattern2 = '|(<a [^>]+)>(.*)</a>|iU'; 
$str = preg_replace($pattern2, '$1 id="$2">$2</a>', $str); 

讓我知道你是否也需要擺脫<font size="1">My Site </font>部分。

相關問題