2015-04-23 39 views
4

我有一個HTML代碼的網頁是這樣的:如何使用PHP刮取ul li標籤中的每個數據值?

<ul class ='trainList'> 
<li> 
    <div class="smallFont farelist no-discount "> 
     <div class="train-no">ABC 701</div> 
     <div class="train-time">06:10<br>07:15</div> 
     <div class="train-info"> 
      <div class="box"> 
       <div class="total-price">MYR 50.00</div> 
       <div class="farediscount"> 
        <div class="actual-fare-price">Array</div> 
        <div class="train-discount"></div> 
       </div> 
      </div> 
</li> 
<li> 
    <div class="smallFont farelist no-discount "> 
     <div class="train-no">ABC 701</div> 
     <div class="train-time">06:10<br>07:15</div> 
     <div class="train-info"> 
      <div class="box"> 
       <div class="total-price">MYR 50.00</div> 
       <div class="farediscount"> 
        <div class="actual-fare-price">Array</div> 
        <div class="train-discount"></div> 
       </div> 
      </div> 
</li> 

我想從上面的代碼湊並提取訓練沒有,列車時間和列車的價格。

我的代碼不會刮我想要的信息,但給我空白。我查了很多以前發佈的問題,但是我找不到類似的東西。

我的代碼:

$train_doc = new DOMDocument(); 

libxml_use_internal_errors(TRUE); 

if(!empty($html)){ 

    $train_doc->loadHTML($html); 

    libxml_clear_errors(); 

    $train_xpath = new DOMXPath($train_doc); 


    $train_list = array(); 

$train = $train_xpath->query('//div[@class="smallFont farelist no-discount"]'); 
var_dump($train); 
if($train->length > 0){ 


    foreach($train as $pat){ 

     $name = $train_xpath->query('div[@class="train-no"]', $pat)->item(0)->nodeValue; 

     $train_types = array(); 
     $types = $train_xpath->query('div[@class="train-time"]/a', $pat); 


     foreach($types as $type){ 
      $train_types[] = $type->nodeValue; 


     $train_list[] = array('name' => $name, 'types' => $train_types); 

    } 
} 
} 

echo "<pre>"; 
print_r($train_list); 
echo "</pre>"; 
+1

嘗試使用該庫:http://simplehtmldom.sourceforge.net/ –

回答

1

您需要點到元件首先,獲取每個立第一則指向那些需要的元素:

$train_list = array(); 
$train = $train_xpath->query('//li/div[contains(@class, "smallFont farelist no-discount")]'); 
if($train->length > 0) { 
    foreach($train as $t) { 
     $time_s = $train_xpath->evaluate('string(./div[@class="train-time"]/text()[1])', $t); 
     $time_e = $train_xpath->evaluate('string(./div[@class="train-time"]/text()[2])', $t); 
     $train_list[] = array(
      'train_no' => $train_xpath->evaluate('string(./div[@class="train-no"])', $t), 
      'train_time' => "$time_s - $time_e", 
      'train_price' => $train_xpath->evaluate('string(./div[@class="train-info"]/div/div[@class="total-price"])', $t), 
     ); 
    } 
} 

Sample Output

0
libxml_use_internal_errors(true); 
$page = new DOMDocument(); 
$page->preserveWhiteSpace = false; 
$page->loadHTML($html); 
$xpath = new DomXPath($page); 

foreach($xpath->query("//*[contains(@class, 'train-time')]") as $element){ 

     print_r($element->nodeValue); 

} 

希望這會有所幫助

相關問題