2012-05-30 66 views
-4

如何從網站獲取包含提供的關鍵字的網址?如何從網站獲取包含關鍵字的網址?

例如:我想捕獲所有的錨HREF的這一頁http://www.catererglobal.com/rzwritingajobad.html
包含任何關鍵字(促進,就業)

預期成果包括:

http://www.catererglobal.com/recruiters/rz-promote-your-brand 的http:// www.catererglobal.com/recruiters/rz-job-advertising

+5

不太清楚你到底要 –

+1

你正在尋找一個「相關文章」式系統? – Death

+2

@webbandit我覺得'不太清楚'是非常慷慨的 –

回答

0

這是我如何做到這一點在PHP =)

<?php 
$oldSetting = libxml_use_internal_errors(true); 
libxml_clear_errors(); 

$html = new DOMDocument(); 
$html->loadHtmlFile('http://www.catererglobal.com/rzwritingajobad.html'); 
$xpath = new DOMXPath($html); 
$links = $xpath->query('//a'); 

foreach ($links as $link) { 
    $cur = $link->getAttribute('href'); 
    if (preg_match('/(promote|job)/', $cur)) { echo "$cur\n"; } 
} 

libxml_clear_errors(); 
libxml_use_internal_errors($oldSetting); 
?> 

輸出是:

http://www.catererglobal.com/recruiters/rz-job-advertising/10298792/post-a-job/ 
/recruiters/rz-job-advertising 
/recruiters/rz-promote-your-brand 
/moreterms/job-location 
http://www.madgex.com/job-boards/ 

XPath是我們最好的朋友;)