2013-10-14 112 views
0

我想解析html文檔。 'h2'之後我需要所有'p'的內容。HTML DOMDocument從標籤後面的段落獲取字符串

的HTML解析:(例子)

<h1>Lorem ipsum</h1> 
<p> 
    Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Aenean commodo ligula eget dolor. Aenean massa. Cum sociis natoque penatibus et magnis dis parturient montes, nascetur ridiculus mus. Donec quam felis, ultricies nec, pellentesque eu, pretium quis, sem. Nulla consequat massa quis enim. Donec pede justo, 
</p> 

<h2>Aenean commodo</h2> 
<p> 
    Aenean commodo ligula eget dolor. Aenean massa. Cum sociis natoque penatibus et magnis dis parturient montes, nascetur ridiculus mus. 
</p> 

<h2>consectetuer adipiscing</h2> 
<p> 
    Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Aenean commodo ligula eget dolor. Aenean massa. Cum sociis natoque penatibus et magnis dis parturient montes, nascetur ridiculus mus. Donec quam felis, ultricies nec, pellentesque eu, pretium quis, sem. Nulla consequat massa quis enim. Donec pede justo, 
</p> 

在這裏,我想最後兩個 'P' 標籤(動態)。


這裏我的PHP代碼:

$dom = new DOMDocument(); 
$dom->loadHTMLFile($html_file); 
libxml_use_internal_errors(true); 

$h2_tags = $dom->getElementsByTagName('h2'); 

foreach($h2_tags as $single_tag) { 

    echo $single_tag->textContent;   
    print_r($single_tag); 

} 

這只是給了我h2的文本內容。但是在h2之後我需要'p'。 這是可能的還是我需要使用其他課程?

回答

2

你可以試試下面的代碼:

$dom = new DOMDocument(); 
$dom->loadHTMLFile($html_file); 
libxml_use_internal_errors(true); 

$xpath = new DomXPath($dom); 
$nodeList = $xpath->evaluate('//p[preceding::h2]/text()'); 

foreach ($nodeList as $domElement){ 
    echo $domElement->textContent."<br><br>"; 
} 

參考輸出:http://phpfiddle.org/main/code/7i5-3ir

0
<?php 

$items = array(); 

$document = new DOMDocument; 
@$document->loadHTMLFile('example.html'); 

foreach ($document->getElementsByTagName('h2') as $node) { 
    while ($node = $node->nextSibling) { 
     if ($node->nodeType == XML_ELEMENT_NODE) { 
      if ($node->nodeName == 'p') { 
       $items[] = $node->textContent; 
      } 

      break; 
     } 
    } 
} 

print_r($items);