2013-02-22 148 views
2

我有以下的HTML標記PHP的DomDocument解析HTML

<div contenteditable="true" class="text"></div> 
<div contenteditable="true" class="text"></div> 
<div style="display: block;" class="ui-draggable"> 
    <img class='avatar' src=""/> 
    <p style=""> 
    <img class='pic' src=""/><br> 
    <span class='fulltext' style="display:none"></span> 
    </p>-<span class='create'></span> 
    <a class='permalink' href=""></a> 
    </div> 
<div contenteditable="true" class="text"></div> 
<div style="display: block;" class="ui-draggable"> 
    <img class='avatar' src=""/> 
    <p style=""> 
    <img class='pic' src=""/><br> 
    <span class='fulltext' style="display:none"></span> 
    </p><span class='create'></span><a class='permalink' href=""></a> 
    </div> 

父div的可more.In爲了解析信息,我使用下面的代碼插入到DB -

$dom = new DOMDocument(); 
$dom->loadHTML($xml); 
$xpath = new DOMXPath($dom); 
$div = $xpath->query('//div'); 
$i=0; 
$q=1; 
foreach($div as $book) { 
    $attr = $book->getAttribute('class'); 
    //if div contenteditable 
    if($attr == 'text') { 
     echo '</br>'.$book->nodeValue."</br>"; 
    } 

    else { 
     $new = new DOMDocument(); 
     $newxpath = new DOMXPath($new); 
     $avatar = $xpath->query("(//img[@class='avatar']/@src)[$q]"); 

     $picture = $xpath->query("(//p/img[@class='pic']/@src)[$q]"); 
     $fulltext = $xpath->query("(//p/span[@class='fulltext'])[$q]"); 
     $permalink = $xpath->query("(//a[@class='permalink'])[$q]"); 
     echo $permalink->item(0)->nodeValue; //date 
     echo $permalink->item(0)->getAttribute('href'); 
     echo $fulltext->item(0)->nodeValue; 
     echo $avatar->item(0)->value; 
     echo $picture->item(0)->value; 
     $q++; 
    } 
    $i++; 
} 

但我認爲有一個更好的方法來解析html。在那兒?預先感謝您

+1

'$化身= $化身;'是沒用的 – artragis 2013-02-22 12:03:22

+0

是的,我已經錯過了。謝謝 – lam3r4370 2013-02-22 12:06:29

回答

6

注意DOMXPath::query支持第二PARAM稱爲contextparam。你也不需要在循環內部使用第二個DOMDocument和DOMXPath。使用:

$avatar = $xpath->query("img[@class='avatar']/@src", $book); 

得到<img src="">相對於div節點的屬性節點。如果你按照我的建議你的例子應該沒​​問題。


又來了一個版本的代碼遵循上面說:

$dom = new DOMDocument(); 
$dom->loadHTML($xml); 

$xpath = new DOMXPath($dom); 
$divs = $xpath->query('//div'); 

foreach($divs as $book) { 
    $attr = $book->getAttribute('class'); 
    if($attr == 'text') { 
     echo '</br>'.$book->nodeValue."</br>"; 
    } else { 
     $avatar = $xpath->query("img[@class='avatar']/@src", $book); 
     $picture = $xpath->query("p/img[@class='pic']/@src", $book); 
     $fulltext = $xpath->query("p/span[@class='fulltext']", $book); 
     $permalink = $xpath->query("a[@class='permalink']", $book); 
     echo $permalink->item(0)->nodeValue; //date 
     echo $permalink->item(0)->getAttribute('href'); 
     echo $fulltext->item(0)->nodeValue; 
     echo $avatar->item(0)->value; 
     echo $picture->item(0)->value; 
    } 
} 
+0

「嘗試獲取非對象的屬性」 - 'echo $ picture - >'','echo $ fulltext - > ..' – lam3r4370 2013-02-22 12:34:29

+0

您可以將完整的HTML發佈到pastebin嗎? – hek2mgl 2013-02-22 12:40:13

+0

http://pastebin.com/jej88mbz - 填充示例數據的完整HTML – lam3r4370 2013-02-22 12:47:31

1

事實上,您做得正確:html必須用DOM對象進行分析。 那麼一些優化可能是布拉夫:

$div = $xpath->query('//div'); 

相當貪婪,一個的getElementsByTagName應該是比較合適的:

$div = $dom->getElementsByTagName('div'); 
+0

我懷疑'$ q'的使用 – lam3r4370 2013-02-22 12:09:04

+0

@artragis請注意,這兩個語句將返回相同的值。任何狀況之下。 – hek2mgl 2013-02-22 13:30:29

+1

getElementsByTagName被緩存,所以它在內存中不那麼貪婪。讓我在@internals列表中找到該消息並將其作爲證據顯示給您。 – artragis 2013-02-22 13:39:30