正則表達式以匹配包含「Google」的鏈接

我想使用PHP正則表達式來匹配包含單詞google的所有鏈接。我試過這個：正則表達式以匹配包含「Google」的鏈接

$url = "http://www.google.com"; 
$html = file_get_contents($url); 
preg_match_all('/<a.*(.*?)".*>(.*google.*?)<\/a>/i',$html,$links); 
echo '<pre />'; 
print_r($links); // it should return 2 links 'About Google' & 'Go to Google English'

但是它什麼都沒有返回。爲什麼？

來源

2011-03-06 yuli chika

這裏的「問題」是，當完美的解析器和XPath可用時，您正在使用正則表達式。 – 2011-03-06 10:36:00

你應該使用dom parser，因爲在HTML文檔中使用正則表達式可能會「痛苦」地出錯。嘗試類似這樣的

//Disable displaying errors 
libxml_use_internal_errors(TRUE); 

$url="http://www.google.com"; 
$html=file_get_contents($url); 


$doc = new DOMDocument(); 
$doc->loadHTML($html); 
$n=0; 
foreach ($doc->getElementsByTagName('a') as $a) { 
    //check if anchor contains the word 'google' and print it out 
    if ($a->hasAttribute('href') && strpos($a->getAttribute('href'),'google')) { 
     echo "Anchor" . ++$n . ': '. $a->getAttribute('href') . '<br>'; 
    } 
}

來源

2011-03-06 10:53:09 Francesco

wahoo ~~ dom可以做到這一點。非常感謝。我學習了一些新的。 – 2011-03-06 11:04:48

這與OP想要的不同（至少通過查看他的代碼）。他似乎希望獲得* text *包含Google的鏈接，而不是URL。但是，因爲這是被接受的答案......要麼他沒有正確指出，要麼不在乎。 – 2011-03-06 11:08:05

更好的是在這裏使用XPath：

$url="http://www.google.com"; 
$html=file_get_contents($url); 

$doc = new DOMDocument; 
$doc->loadHTML($html); 

$xpath = new DOMXPath($doc); 
$query = "//a[contains(translate(text(), 'GOOGLE', 'google'), 'google')]"; 
// or just: 
// $query = "//a[contains(text(),'Google')]"; 
$links = $xpath->query($query);

$links將是一個DOMNodeList可以迭代。

來源

2011-03-06 10:27:29

正則表達式以匹配包含「Google」的鏈接

回答

相關問題