2013-05-16 79 views
1

我想弄清楚如何只能從這個頁面獲得電影的ID。來自IMDB的extracitng電影ID

http://www.imdb.com/movies-in-theaters/2013-05/ 

我有這個,但我不能得到它的工作。

$content = file_get_contents("http://www.imdb.com/movies-in-theaters/"); 

$dom = new DomDocument(); 
$dom->loadHTML($content); 
$urls = $dom->getElementsByTagName('a'); 

另外我對DomDocument知之甚少。這當前獲取頁面上的所有鏈接。不過,我只是需要得到來自電影標題鏈接電影的ID,如

http://www.imdb.com/title/tt1869716/ 

id是

tt1869716

+0

對不起,StackOverflow給了我headac他在代碼塊中留言。這裏有一個要點: https://gist.github.com/ghalusa/5591124 – Gor

+1

你好,Gor,謝謝你。我測試了這一點,結果令人難以置信。它返回了很多信息。我們只能得到電影ID嗎? –

+0

我還沒有測試過,但下面的答案(來自@enenen)看起來好像很好地處理了事情。 – Gor

回答

2
function get_url_contents($url) { 
    $crl = curl_init(); 

    curl_setopt($crl, CURLOPT_USERAGENT, 'Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; .NET CLR 1.1.4322)'); 
    curl_setopt($crl, CURLOPT_URL, $url); 
    curl_setopt($crl, CURLOPT_RETURNTRANSFER, 1); 
    curl_setopt($crl, CURLOPT_CONNECTTIMEOUT, 5); 

    $ret = curl_exec($crl); 
    curl_close($crl); 
    return $ret; 
} 

function getElementsByClassName(DOMDocument $DOMDocument, $ClassName) 
{ 
    $Elements = $DOMDocument -> getElementsByTagName("*"); 
    $Matched = array(); 

    foreach($Elements as $node) 
    { 
     if(! $node -> hasAttributes()) 
      continue; 

     $classAttribute = $node -> attributes -> getNamedItem('class'); 

     if(! $classAttribute) 
      continue; 

     $classes = explode(' ', $classAttribute -> nodeValue); 

     if(in_array($ClassName, $classes)) 
      $Matched[] = $node; 
    } 

    return $Matched; 
} 

libxml_use_internal_errors(true); 
$content = get_url_contents("http://www.imdb.com/movies-in-theaters/"); 

$dom = new DomDocument(); 
$dom->loadHTML($content); 

$elemsByClassName = getElementsByClassName($dom, 'overview-top'); 

foreach($elemsByClassName as $elem) { 

    foreach ($elem->getElementsByTagName('a') as $a) { 
     preg_match('/(title\/)([0-9A-Za-z]+)(\/)?/',$a->getAttribute('href'), $matches); 

     echo $a->nodeValue. ' - ' . $matches[2] . '<br/>'; 
     break; // we need only the first A tag. 
    } 
} 

輸出:

Star Trek Into Darkness (2013) - tt1408101 
Frances Ha (2012) - tt2347569 
Stories We Tell (2012) - tt2366450 
The Expatriate (2012) - tt1645155 
The English Teacher (2013) - tt2055765 
Augustine (2012) - tt2098628 
Black Rock (2012) - tt1930294 
State 194 (2012) - tt2324918 
Iron Man 3 (2013) - tt1300854 
The Great Gatsby (2013) - tt1343092 
Pain & Gain (2013) - tt1980209 
Peeples (2013) - tt1699755 
42 (2013) - tt0453562 
Oblivion (2013) - tt1483013 
The Croods (2013) - tt0481499 
The Big Wedding (2013) - tt1931435 
Mud (2012) - tt1935179 
Oz the Great and Powerful (2013) - tt1623205 
+0

Keith,upvote eneen的回答,是嗎?因爲它工作得很好:) – Gor

+0

這太棒了!謝謝您的幫助! –

相關問題