2013-05-13 79 views
1

我需要剝離HTML文檔中的一些值和一些原始HTML。我想過使用XPath,但我無法讓我的查詢工作。PHP和XPath查詢

這裏是我想達到的目標:

<div class="unit-id"> 
    <div class="title"> 
     some title-1 
    </div> 

    <div class="another-class"> 
     another class 
    </div> 
    <p>segwegw1<p> 
    <p>segwegw1<p> 
    <p>segwegw1<p> 
    <p>segwegw1<p> 
    <ul> 
    <li>jfjfj</li> 
    <li>jfjfj</li> 
    <li>jfjfj</li> 
    </ul> 
</div> 


<div class="unit-id"> 
    <div class="title"> 
     some title-2 
    </div> 
    <div class="another-class"> 
     some other class 
    </div> 
    <p>segwegw2<p> 
    <p>segwegw2<p> 
    <p>segwegw2<p> 
    <p>segwegw2<p> 
</div> 


<div class="unit-id"> 
    <div class="title"> 
     some title-3 
    </div> 
    <div class="some-other-class"> 
     some other data 
    </div> 
    <p>segwegw3<p> 
    <p>segwegw3<p> 
    <p>segwegw3<p> 
    <p>segwegw3<p> 
</div> 

所以我想查詢通過每個div一個單位ID級別迭代和帶班的title返回divs的價值, HTML的其餘部分,除了divs以外,還有p標籤和ul這些特殊的單元ID分類爲div,然後進行下一次迭代。

這可能嗎?你能否給我提供一個如何編寫這個查詢的例子?有沒有更好的方法來做到這一點?

+1

你有什麼企圖到目前爲止,計算器是不是寫你的代碼,但更多的固定你有問題,你有沒有代碼來顯示。 – Kivylius 2013-05-13 17:11:09

+0

我已經創建了查詢來返回與類unit-id(查詢(「//div [@ class ='unit-id']」)的div元素的集合,但是,然後我需要返回以下所有非div元素,直到下一個具有'unit-id'類的div。這個我很苦惱。有沒有比使用xpath查詢更好的方法? – daktau 2013-05-13 18:30:40

+0

@Jessica - 在StackOverflow中問過的一些最好的問題被問到而不會顯示任何不成功的代碼。查看此鏈接:http://meta.stackexchange.com/questions/122986/is-it-ok-to-leave-what-have-you-tried-comments – 2013-05-13 19:35:37

回答

2

此代碼有點像你在找什麼:

function get_content($data){ 
    $doc = new DOMDocument(); 
    //load HTML string into document object 
    if (! @$doc->loadHTML($data)){ 
     return FALSE; 
    } 
    //create XPath object using the document object as the parameter 
    $xpath = new DOMXPath($doc); 
    $query = "//div[@class='unit-id']"; 
    //XPath queries return a NodeList 
    $res = $xpath->query($query); 
    $out = array(); 
    foreach ($res as $key => $node){ 
     //subquery 
     $sub = $xpath->query('.//div[@class="title"]', $node); 
     $out[$key]['title'] = trim($sub->item(0)->nodeValue); 
     foreach ($node->getElementsByTagName('p') as $key2 => $value){ 
      $out[$key]['par'][$key2] = $value->nodeValue; 
     } 
     foreach ($node->getElementsByTagName('li') as $key2 => $value){ 
      $out[$key]['list'][$key2] = $value->nodeValue; 
     } 
    } 
    return $out; 
} 

請注意,你有你的HTML錯誤。您正在關閉段落標記應該有尾部斜槓</p>

下面是輸出:

array 
    0 => 
    array 
     'title' => string 'some title-1' (length=12) 
     'par' => 
     array 
      0 => string 'segwegw1' (length=8) 
      1 => string 'segwegw1' (length=8) 
      2 => string 'segwegw1' (length=8) 
      3 => string 'segwegw1' (length=8) 
     'list' => 
     array 
      0 => string 'jfjfj' (length=5) 
      1 => string 'jfjfj' (length=5) 
      2 => string 'jfjfj' (length=5) 
    1 => 
    array 
     'title' => string 'some title-2' (length=12) 
     'par' => 
     array 
      0 => string 'segwegw2' (length=8) 
      1 => string 'segwegw2' (length=8) 
      2 => string 'segwegw2' (length=8) 
      3 => string 'segwegw2' (length=8) 
+0

太好了,這對我很有幫助出。它在最初的查詢上做了一個子查詢,令我感到困惑。乾杯! – daktau 2013-05-15 09:20:35