2012-09-16 54 views
4

對,我試圖抓取網頁中的鏈接。鏈接被存儲在數組$ links中。從此陣列中抓取標題

有沒有辦法讓我從數組中的每個鏈接抓取標題(來自下面的函數)。那會是一個多維數組嗎?我怎樣才能做到這一點?

$links = Array(); 
$URL = 'http://www.theqlick.com'; // change it for urls to grab 
// grabs the urls from URL 
$file = file_get_html($URL); 
foreach ($file->find('a') as $theelement) { 
    $links[] = url_to_absolute($URL, $theelement->href); 
} 
print_r($links); 


    function getTitle($links) 
    { 
     //change it for the original titles. 
     $str = file_get_contents("http://www.theqlick.com"); 
     if (strlen($str)>0) { 
     preg_match("/\<title\>(.*)\<\/title\>/", $str, $title); 
    return $title[1]; 
    } 
} 

$metatitle = getTitle(); 
echo $metatitle; 
+0

解析與正則表達式HTML是一個壞主意。 http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags – Jessedc

回答

1

我沒有安裝測試這個正確的庫,但它應該給你從哪裏開始的想法:

$links = Array(); 
$URL = 'http://www.theqlick.com'; // change it for urls to grab 
// grabs the urls from URL 
$file = file_get_html($URL); 
foreach ($file->find('a') as $theelement) { 
    $link = array(); 
    $link['url'] = url_to_absolute($URL, $theelement->href); 
    $link['title'] = getTitle($link['url']); 
    $links[] = $link; 
} 
print_r($links); 

function getTitle($url) 
{ 
    $file = file_get_html($url); 
    $titles = $file->find('title'); 
    if (is_array($titles)) { 
     return $titles[0]; 
    } else { 
     return null; 
    } 
}