2013-05-14 78 views
1

我試圖弄清楚如何才能從此page中獲得電影的標題。從特定鏈接中提取鏈接文本

我有這個,但我不能得到它的工作。另外我對DomDocument知之甚少。這當前獲取頁面上的所有鏈接。但是,我只需要獲取列出的電影標題的鏈接。

$content = file_get_contents("http://www.imdb.com/movies-in-theaters/"); 

$dom = new DomDocument(); 
$dom->loadHTML($content); 
$urls = $dom->getElementsByTagName('a'); 

回答

2
$dom = new DomDocument(); 
@$dom->loadHTMLFile('http://www.imdb.com/movies-in-theaters/'); 
$urls = $dom->getElementsByTagName('a'); 
$titles = array(); 

foreach ($urls as $url) 
{ 
    if ('overview-top' === $url->parentNode->parentNode->getAttribute('class')) 
     $titles[] = $url->nodeValue; 
} 

print_r($titles); 

將輸出:

Array 
(
    [0] => Star Trek Into Darkness (2013) 
    [1] => Frances Ha (2012) 
    [2] => Stories We Tell (2012) 
    [3] => Erased (2012) 
    [4] => The English Teacher (2013) 
    [5] => Augustine (2012) 
    [6] => Black Rock (2012) 
    [7] => State 194 (2012) 
    [8] => Iron Man 3 (2013) 
    [9] => The Great Gatsby (2013) 
    [10] => Pain & Gain (2013) 
    [11] => Peeples (2013) 
    [12] => 42 (2013) 
    [13] => Oblivion (2013) 
    [14] => The Croods (2013) 
    [15] => The Big Wedding (2013) 
    [16] => Mud (2012) 
    [17] => Oz the Great and Powerful (2013) 
) 

您可以使用XPath來做到這一點爲好,但我不知道它足夠好,這樣做的。

+1

非常感謝你,這正是我所需要的。 – 2013-05-14 05:49:54

+0

+「星際迷航」是一部很好的電影 – Baba 2013-05-25 14:25:25