從頁面抓取所有鏈接

我想抓取頁面中的所有鏈接（href）。從頁面抓取所有鏈接

這是我的實際代碼：

preg_match_all('/href=.([^"\' ]+)/i', $content, $anchor);

但這僅抓住域和子域（如name.name.ex或name.ex），但不搶的自定義網址像name.ex/name/name.php。

任何人都可以請幫忙正則表達式嗎？

來源

2013-12-22 Mirko Brombin

你可以列出所有的域（即.com，.org，.net等），然後preg_match_all它們。這裏是所有頂級域名的wiki http://en.wikipedia.org/wiki/List_of_Internet_top-level_domains – Enijar

我建議不要對此使用正則表達式。我建議您使用DOM解析並獲得您的結果。

下面是本使用DOM和XPath

$html = '<a href="name.ex/name/name.php">text</a> 
     <a href="foo.com">foobar</a>'; 

$doc = new DOMDocument(); 
$doc->loadHTML($html); 

$xpath = new DOMXPath($doc); 

foreach ($xpath->query('//a') as $link) { 
    $links[] = $link->getAttribute('href'); 
} 

print_r($links);

見Working demo

來源

2013-12-22 14:49:29 hwnd

試試這個正則表達式：

$pattern = "/href="([^\s"]+)/"; 
preg_match_all($pattern, $content, $matches); 

if (count($matches[1]) { 
    foreach($matches[1] as $match) 
    echo $match . "<br />"; 
}

來源

2013-12-22 12:25:30 di3sel

不要工作，它不會添加網址。 –

添加了完整的代碼，這對我來說很有用。請檢查 – di3sel

在這裏你去！

$string = "<a href='test.php/url' class=>test</a>testar <a href='test2.php/url2' class=>test</a>"; 
$pattern = "/<a(?:[^>]*)href=([^ ]*)(?:[^>]*)>/"; 

preg_match_all($pattern, $string, $matches); 

foreach($matches[1] as $match){ 
    echo $match; 
}

來源

2013-12-22 12:46:54

更容易使用DOM文檔的例子：

$doc = new DOMDocument(); 
@$doc->loadHTML($html); 

$linkNodes = $doc->getElementsByTagName('a'); 

foreach($linkNodes as $linkNode) { 
    $urls[] = $linkNode->getAttribute('href'); 
} 

print_r($urls);

來源

2013-12-22 17:13:19

從頁面抓取所有鏈接

回答

相關問題