php - 簡單的HTML DOM - 其他元素之間的元素

我想寫一個PHP腳本來抓取一個網站，並保留在數據庫中的一些元素。php - 簡單的HTML DOM - 其他元素之間的元素

這裏是我的問題：一個網頁是這樣寫的：

<h2>The title 1</h2> 
<p class="one_class"> Some text </p> 
<p> Some interesting text </p> 

<h2>The title 2</h2> 
<p class="one_class"> Some text </p> 
<p> Some interesting text </p> 

<p class="one_class"> Some different text </p> 
<p> Some other interesting text </p> 

<h2>The title 3</h2> 
<p class="one_class"> Some text </p> 
<p> Some interesting text </p>

我想只有H2和P有趣的文本，而不是在p類=「one_class」。

我嘗試這樣做PHP代碼：

<?php 
$numberP = 0; 
foreach($html->find('p') as $p) 
{ 
    $pIsOneClass = PIsOneClass($html, $p); 

    if($pIsOneClass == false) 
    { 
     echo $p->outertext; 
       $h2 = $html->find("h2", $numberP); 
       echo $h2->outertext; 
       $numberP++; 
     } 

} 
?>

功能PIsOneClass（$ HTML，$ p）爲：

<?php 
function PIsOneClass($html, $p) 
{ 
foreach($html->find("p.one_class") as $p_one_class) 
    { 
     if($p == $p_one_class) 
     { 
      return true; 
     }   
    } 
    return false; 
} 
?>

它不工作，我明白爲什麼，但我不知道如何解決它。

我們怎麼說「我想每個沒有班級的人都在兩個h2之間？」

Thx很多！

來源

2014-10-19 Maxime Thizeau

如果他們都是'p.one_class'，那麼爲什麼不在輸出保存結果之前查找這些'p'標籤並將其刪除？ – 2014-10-19 14:07:19

但是我怎樣才能訂購h2和p？有了這個腳本，它會打印h2 p h2 p h2 p，但我想要類似h2 p p h2 p – 2014-10-19 14:29:49

使用XPath可以更輕鬆地完成此任務，因爲您正在抓取多個元素，並且要保持源代碼的順序。您可以使用PHP的DOM庫，其中包括DOMXPath，查找和篩選需要的元素：

$html = '<h2>The title 1</h2> 
<p class="one_class"> Some text </p> 
<p> Some interesting text </p> 

<h2>The title 2</h2> 
<p class="one_class"> Some text </p> 
<p> Some interesting text </p> 

<p class="one_class"> Some different text </p> 
<p> Some other interesting text </p> 

<h2>The title 3</h2> 
<p class="one_class"> Some text </p> 
<p> Some interesting text </p>'; 

# create a new DOM document and load the html 
$dom = new DOMDocument; 
$dom->loadHTML($html); 
# create a new DOMXPath object 
$xp = new DOMXPath($dom); 

# search for all h2 elements and all p elements that do not have the class 'one_class' 
$interest = $xp->query('//h2 | //p[not(@class="one_class")]'); 

# iterate through the array of search results (h2 and p elements), printing out node 
# names and values 
foreach ($interest as $i) { 
    echo "node " . $i->nodeName . ", value: " . $i->nodeValue . PHP_EOL; 
}

輸出：

node h2, value: The title 1 
node p, value: Some interesting text 
node h2, value: The title 2 
node p, value: Some interesting text 
node p, value: Some other interesting text 
node h2, value: The title 3 
node p, value: Some interesting text

正如你所看到的，原文停留在秩序，它的容易消除你不想要的節點。

來源

2014-10-19 15:31:13

謝謝，我不知道存在。是否可以同時使用Simple Html Dom或無用？ – 2014-10-19 17:57:01

您無法使用Simple HTML DOM執行XPath操作，但可以從DOMDocument輸出HTML，然後使用SHD讀取它。你應該可以用DOM來做你想做的一切，不過這是一個處理XML的非常全面的庫。 [這是手冊]（http://php.net/manual/en/book.dom.php）。 – 2014-10-20 07:34:57

從已經具有一定值的指定屬性的simpleHTML dom manual

[attribute=value]

匹配元素。或

[!attribute]

匹配沒有指定屬性的元素。

來源

2014-10-19 14:58:19 Billy

php - 簡單的HTML DOM - 其他元素之間的元素

回答

相關問題