卡在頁面上遍歷html dom

好的。我再次陷入困境，似乎互聯網只是用HTML DOM教程遍歷了一個dom。我有這個頁面（http://www.nasdaqomxbaltic.com/market/?pg=news & news_id = 250910），我想要做的是將文本The statement of shareholders for shares sale and for shares purchase attached.和附加文件放入一個變量中。我試圖做到最有效的方式，所以我不使用simple_html_dom。我不會使用XPath，如果我有選擇，或者它會更快，但我不知道:)卡在頁面上遍歷html dom

編輯：試了菲爾的代碼。似乎無法弄清楚爲什麼它仍然無法正常工作。

<? 
$dom = new DOMDocument(); 
@$dom->loadHTMLFile("http://www.nasdaqomxbaltic.com/market/?pg=news&news_id=250910"); 

$xpath = new DOMXPath($dom); 
$paragraph = $xpath->query('//table[@id="previewTable"]/tbody/tr[2]/td/p');//tried removing tbody, doesn't fix, why is it there? 
if ($paragraph->length == 1) {//what is this? 
    $sentence = $paragraph->nodeValue; 
    print_r($sentence);//doesnt work (blank) 
} 
$links = $xpath->query('//table[@id="previewTable"]//td[@class="tdAttachment"]//a'); 
foreach ($links as $link) { 
    $linkName = $link->nodeValue; 
    $linkUrl = $link->getAttribute('href'); 
echo $linkName; 
echo $linkUrl;//works 
} 
?>

來源

2011-08-14 Josh

？ – Josh

nope - 它不能像那樣工作 – calumbrodie

你不需要'file_get_contents（）'，只需使用'DOMDocument :: loadHTMLFile（）' – Phil

這真的取決於標記的固定程度。

假設結構是相當靜態的，檢索句子，儘量

$paragraphs = $xpath->query('//table[@id="previewTable"]/tr[2]/td/p'); 
if ($paragraphs->length > 0) { // check to make sure we got at least one node 
    $sentence = $paragraphs->item(0)->nodeValue; 
}

檢索我能碰到這樣的鏈接稍微複雜

$links = $xpath->query('//table[@id="previewTable"]//td[@class="tdAttachment"]//a'); 
foreach ($links as $link) { 
    $linkName = $link->nodeValue; 
    $linkUrl = $link->getAttribute('href'); 

    // do something with these values 
}

來源

2011-08-14 23:28:45 Phil

謝謝。更新。 – Josh

@Josh我不好，我忘了拿到第一段。也刪除'tbody'，不應該使用DOM檢查器來讀取源代碼。更新我的回答 – Phil

+1對我來說，這看起來不錯。 – alex

卡在頁面上遍歷html dom

回答

相關問題