2016-11-24 62 views
1

我正在使用CURL來檢索頁面並存儲HTML。我成功做到這一點,並用含有HTML類似這樣的變量結束(在TD的內容是不一樣的,總是改變):解析HTML以查找PHP中的某些元素

html code above.... 

    <tr class="myclass"> 
    <td>Dynamic Content One</td> 
    <td>Dynamic Content Two</td> 
    <td>Dynamic Content Three</td> 
    </tr> 

    <tr class="myclass"> 
    <td>Dynamic Content One</td> 
    <td>Dynamic Content Two</td> 
    <td>Dynamic Content Three</td> 
    </tr> 

    More of the same <tr> ...... 

html code below.... 

我現在的目標是有解析HTML和有關聯數組所謂的結果(),它存儲所有<tr>作爲元素,數組應該是這樣的:

$result[0]["first_content"] = "Dynamic Content One" 
$result[0]["second_content"] = "Dynamic Content Two" 
$result[0]["third_content"] = "Dynamic Content Three" 

$result[1]["first_content"] = "Dynamic Content One" 
$result[1]["second_content"] = "Dynamic Content Two" 
$result[1]["third_content"] = "Dynamic Content Three" 

.. more elements in array depending on how many <tr> there was 

我發現它安靜棘手的分析是這樣的。我已經使用了DOMdocument模塊和DOMXpath模塊,但是我已經實現了一個包含每個<td>的元素的數組,並且不知道將算法存儲到數組中的位置。也許有更好的方法來做到這一點?這裏是我當前的代碼:

$dom = new DOMDocument; 
     @$dom -> loadHTML($retrievedHtml); 

     $xPath = new DOMXpath($dom); 

     $xPathQuery = "//tr[@class='myclass']"; 
     $elements = $xPath -> query($xPathQuery); 

     if(!is_null($elements)){ 

      $results = array(); 

      foreach($elements as $element){ 

       $nodes = $element -> childNodes; 

       print $nodes -> nodeValue; 

       foreach($nodes as $node){ 
        $results[] = $node -> nodeValue; 
       } 

      } 

回答

0

爲了實現輸出數組的結構(減去文字鍵,如「first_content」等),然後對每一行添加一個新的層面的陣列和填充該維度。無論如何,我認爲這就是你想要達到的目標!

$dom = new DOMDocument; 
@$dom->loadHTML($retrievedHtml); 

$xPath = new DOMXpath($dom); 

$xPathQuery = "//tr[@class='myclass']"; 
$elements = $xPath -> query($xPathQuery); 

if(!is_null($elements)){ 

    $results = array(); 

    foreach($elements as $index => $element){ 

     $nodes = $element -> childNodes; 

     foreach($nodes as $subindex => $node){ 
      /* Each table row is assigned in new level in array using $index */ 
      if($node->nodeType == XML_ELEMENT_NODE) $results[ $index ][] = $node->nodeValue; 
     } 
    } 

    echo '<pre>',print_r($results, true),'</pre>'; 
} 
+0

This works。有什麼我應該留意的嗎?例如,如果nodetype不是XML_ELEMENT_NODE?不確定那是什麼意思 –