1
我正在使用CURL來檢索頁面並存儲HTML。我成功做到這一點,並用含有HTML類似這樣的變量結束(在TD的內容是不一樣的,總是改變):解析HTML以查找PHP中的某些元素
html code above....
<tr class="myclass">
<td>Dynamic Content One</td>
<td>Dynamic Content Two</td>
<td>Dynamic Content Three</td>
</tr>
<tr class="myclass">
<td>Dynamic Content One</td>
<td>Dynamic Content Two</td>
<td>Dynamic Content Three</td>
</tr>
More of the same <tr> ......
html code below....
我現在的目標是有解析HTML和有關聯數組所謂的結果(),它存儲所有<tr>
作爲元素,數組應該是這樣的:
$result[0]["first_content"] = "Dynamic Content One"
$result[0]["second_content"] = "Dynamic Content Two"
$result[0]["third_content"] = "Dynamic Content Three"
$result[1]["first_content"] = "Dynamic Content One"
$result[1]["second_content"] = "Dynamic Content Two"
$result[1]["third_content"] = "Dynamic Content Three"
.. more elements in array depending on how many <tr> there was
我發現它安靜棘手的分析是這樣的。我已經使用了DOMdocument模塊和DOMXpath模塊,但是我已經實現了一個包含每個<td>
的元素的數組,並且不知道將算法存儲到數組中的位置。也許有更好的方法來做到這一點?這裏是我當前的代碼:
$dom = new DOMDocument;
@$dom -> loadHTML($retrievedHtml);
$xPath = new DOMXpath($dom);
$xPathQuery = "//tr[@class='myclass']";
$elements = $xPath -> query($xPathQuery);
if(!is_null($elements)){
$results = array();
foreach($elements as $element){
$nodes = $element -> childNodes;
print $nodes -> nodeValue;
foreach($nodes as $node){
$results[] = $node -> nodeValue;
}
}
This works。有什麼我應該留意的嗎?例如,如果nodetype不是XML_ELEMENT_NODE?不確定那是什麼意思 –