2014-03-26 42 views
0

嗨,我是一名初學者,使用simple_html_dom。我試圖從這個示例網站的帖子列表中獲取href列表,並使用下面的代碼進行分頁。Php Simple Html Dom Parser無法獲取分頁內容

<?php 
include('simple_html_dom.php'); 

$html = file_get_html('http://www.themelock.com/wordpress/elegantthemes/'); 

function getArticles($page) { 

    global $articles; 

    $html = new simple_html_dom(); 
    $html->load_file($page); 

    $items = $html->find('h2[class=post-title]'); 

    foreach($items as $post) { 
     $articles[] = array($post->children(0)->href); 
    } 

    foreach($articles as $item) { 
      echo "<div class='item'>"; 
      echo $item[0]; 
      echo "</div>"; 
     } 
} 

if($next = $html->find('div[class=navigation]', 0)->last_child()) { 
    $URL = $next->href; 

    $html->clear(); 
    unset($html); 

    getArticles($URL); 
} 

?> 

結果我得到

http://www.themelock.com/wordpress/908-minimal-elegantthemes-wordpress-theme.html 
http://www.themelock.com/wordpress/892-event-elegantthemes-wordpress-theme.html 
http://www.themelock.com/wordpress/882-askit-elegantthemes-wordpress-theme.html 
http://www.themelock.com/wordpress/853-lightbright-elegantthemes-wordpress-theme.html 
http://www.themelock.com/wordpress/850-inreview-elegantthemes-review-wordpress-theme.html 
http://www.themelock.com/wordpress/807-boutique-elegantthemes-wordpress-theme.html 
http://www.themelock.com/wordpress/804-elist-elegantthemes-directory-wordpress-theme.html 
http://www.themelock.com/wordpress/798-webly-elegantthemes-wordpress-theme.html 
http://www.themelock.com/wordpress/795-elegantestate-real-estate-elegantthemes-wordpress-theme.html 
http://www.themelock.com/wordpress/786-notebook-elegantthemes-wordpress-theme.html 

上面的代碼只讀取下一頁(第二頁)內容。我想知道如何獲得第一頁的網址後接下一頁。

有沒有人知道如何做到這一點?

+0

你需要那麼兩個循環:一是通過網頁鏈接循環,另一個循環的內容,並提取數據(你已經有了)...檢查這個答案:http://stackoverflow.com/a/21207159/1519058 – Enissay

+0

你只需要將'if'改爲'while'即可。 – pguardiario

回答

1

感謝您的支持傢伙,我做了這個工作,使用下面的代碼,

<?php 
include('simple_html_dom.php'); 

$url = "http://www.themelock.com/wordpress/yootheme-wordpress/"; 

// Start from the main page 
$nextLink = $url; 

// Loop on each next Link as long as it exsists 
while ($nextLink) { 
    echo "<hr>nextLink: $nextLink<br>"; 
    //Create a DOM object 
    $html = new simple_html_dom(); 
    // Load HTML from a url 
    $html->load_file($nextLink); 

    $posts = $html->find('h2[class=post-title]'); 

    foreach($posts as $post) { 
     // Get the link 
     $articles = $post->children(0)->href;   
     echo $articles.'</br>'; 
    } 

    // Extract the next link, if not found return NULL 
    $nextLink = (($temp = $html->find('div[class=navigation]', 0)->last_child()) ? $temp->href : NULL); 

    // Clear DOM object 
    $html->clear(); 
    unset($html); 
} 

?> 
+0

要將相對URL轉換爲絕對值,[this](http://nadeausoftware.com/sites/NadeauSoftware.com/files/url_to_absolute.zip)爲我完成了這項工作。請記住['rawurldecode'](http://us.php.net/manual/en/function.rawurldecode.php)結果。 – Leo