2011-05-21 52 views
0

我正在嘗試做一個簡單的提取,但我一直以不可預知的結果結束。關於尋找類的Simple_DOM問題

我有這樣的HTML代碼

<div class="thread" style="margin-bottom:25px;"> 

<div class="message"> 

<span class="profile">Suzy Creamcheese</span> 

<span class="time">December 22, 2010 at 11:10 pm</span> 

<div class="msgbody"> 

<div class="subject">New digs</div> 

Hello thank you for trying our soap. <BR> Jim. 

</div> 
</div> 


<div class="message reply"> 

<span class="profile">Lars Jörgenmeier</span> 

<span class="time">December 22, 2010 at 11:45 pm</span> 

<div class="msgbody"> 

I never sold you any soap. 

</div> 

</div> 

</div> 

而且我試圖從「msgbody」,但只有當「輪廓」等於東西outertext。像這樣。

$contents = $html->find('.msgbody'); 
$elements = $html->find('.profile'); 

      $length = sizeof($contents); 

      while($x != sizeof($elements)) { 

      $var = $elements[$x]->outertext; 

         //If profile = the right name 
      if ($var = $name) { 

            $text = $contents[$x]->outertext; 
       echo $text; 

      } 



      $x++; 
     }  

我從錯誤的配置文件中獲取文本,而不是我需要的關聯文本。 有沒有辦法只用一行代碼來拉取所需的信息?

一樣,如果跨度知名度=「正確名稱」,然後 拉它的DIV-msgbody

回答

3

好吧,我要與DOMXpath去這一個。我不知道什麼是外文「的解釋是:,但我會用這個要求去:

一樣,如果跨度知名度=「正確名稱」 然後將其DIV-msgbody

首先,這裏是縮小的HTML測試情況下,我用:

<html> 
<body> 
<div class="thread" style="margin-bottom:25px;"> 

<div class="message"> 

<span class="profile">Suzy Creamcheese</span> 

<span class="time">December 22, 2010 at 11:10 pm</span> 

<div class="msgbody"> 

<div class="subject">New digs</div> 

Hello thank you for trying our soap. <BR> Jim. 

</div> 
</div> 


<div class="message reply"> 

<span class="profile">Lars Jörgenmeier</span> 

<span class="time">December 22, 2010 at 11:45 pm</span> 

<div class="msgbody"> 

I never sold you any soap. 

</div> 

</div> 

</div> 
</body> 
</html> 

因此,我們將彌補這方面的XPath查詢。讓我們顯示了整個事情,然後把它分解:

$messages = $xpath->query("//span[@class='profile' and contains(.,'$profile_name')]/../div[@class='msgbody']"); 

擊穿:

//跨度

給我跨越

//跨度[@類=」個人資料']

給我跨班級的地方 個人資料

//跨度[@類= '個人資料' 和 包含(。, '$ PROFILE_NAME')]

給我跨越其中類是 輪廓和跨度內 包含$profile_name,這是該 名字你以後

//跨度[@類= '個人資料' 和 包含(。, '$ PROFILE_NAME')] /../

給我跨越其中類是 簡介並且跨度 的內部包含$profile_name,這是 名字你現在後走升了一級, 這使我們向<div class="message">

//跨度[@類= '個人資料' 和 包含(。, '$ PROFILE_NAME')]/../ DIV [@類=「msgbody」]

給我跨越其中類是 輪廓和跨度 包含$profile_name內,這是 名字你現在後升了一級, 這得到我們<div class="message">最後,給我 所有div <div class="message"> 其中類是msgbody

那麼現在,這裏的PHP代碼的樣本下:

$doc = new DOMDocument(); 
$doc->loadHTMLFile("test.html"); 

$xpath = new DOMXpath($doc); 
$profile_name = 'Lars Jörgenmeier'; 
$messages = $xpath->query("//span[@class='profile' and contains(.,'$profile_name')]/../div[@class='msgbody']"); 
foreach ($messages as $message) { 
    echo trim("{$message->nodeValue}") . "\n"; 
} 

XPath非常強大。我建議您查看basic tutorial,如果您想查看更多高級用法,則可以檢查XPath standard

+0

誰是很多簡潔的信息。感謝xpath轉換。我愛Simple_DOM,但它出血的記憶! – user734063 2011-05-22 03:17:10

+0

另外,我注意到你必須在頭文件中插入這些字符來獲得特殊字符,比如'Jörgenmeier'來傳遞XPath。 – user734063 2011-05-22 03:57:36

+0

'<!DOCTYPE html PUBLIC「 - // W3C // DTD XHTML 1.0 Strict // EN」「http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd」> ' – user734063 2011-05-22 03:58:12

0

這是一個簡單的HTML DOM工作示例。

我改變你的榜樣HTML,所以會有對蘇西奶油芝士多個配置文件如下:(文件:test_class_class.htm)

<div class="message"> 
    <span class="profile">Suzy Creamcheese</span> 
    <span class="time">December 22, 2010 at 11:10 pm</span> 
    <div class="msgbody"> 
    <div class="subject">New digs</div> 
     Hello thank you for trying our soap. <BR> Jim. 
    </div> 
    </div> 

    <div class="message reply"> 
    <span class="profile">Lars Jörgenmeier</span> 
    <span class="time">December 22, 2010 at 11:45 pm</span> 
    <div class="msgbody"> 
     I never sold you any soap. 
    </div> 
    </div> 
</div> 

<div class="message"> 
    <span class="profile">Suzy Yogurt</span> 
    <span class="time">December 22, 2010 at 11:10 pm</span> 
    <div class="msgbody"> 
    <div class="subject">No Creamcheese</div> 
     This is not Suzy Creamcheese <BR> Jim. 
    </div> 
    </div> 

    <div class="message reply"> 
    <span class="profile">Suzy Creamcheese</span> 
    <span class="time">December 22, 2010 at 11:45 pm</span> 
    <div class="msgbody"> 
     A reply from Suzy Creamcheese. 
    </div> 
    </div> 
</div> 

</div> 

下面是使用簡單的HTML DOM我的測試: 包括( 'simple_html_dom.php');

function getMessage_for_profile($iUrl,$iProfile) 
{ 
    // create HTML DOM 
    $html = file_get_html($iUrl); 

    // get text elements 
    $aoProfile = $html->find('span[class=profile]'); 
    echo "Found ".count($aoProfile)." profiles.<br />"; 

    foreach ($aoProfile as $key=>$oProfile) 
    { 
     if ($oProfile->plaintext == $iProfile) 
     { 
     echo "<b>Profile ".$key.": ".$oProfile->plaintext."</b><br />"; 
// Using $e->next_sibling() 
     $oCurrent = $oProfile; 
     while ($oNext = $oCurrent->next_sibling()) 
     { 
      if ($oNext->class == "msgbody") 
      { 
      echo "<hr />"; 
      echo $oNext->outertext; 
      echo "<hr />"; 
      } 
      $oCurrent = $oNext; 
     } 
     }   
    } 

    // clean up memory 
    $html->clear(); 
    unset($html); 

    return; 
} 
// -------------------------------------------- 
// test it! 
// user_agent header... 
ini_set('user_agent', 'My-Application/2.5'); 

getMessage_for_profile('test_class_class.htm','Suzy Creamcheese'); 
echo "<br /><br /><br />"; 
getMessage_for_profile('test_class_class.htm','Suzy Yogurt'); 

我的產量爲:

Found 4 profiles. 
Profile 0: Suzy Creamcheese 
-------------------------------- 
New digs 
Hello thank you for trying our soap. 
Jim. 
--------------------------------- 
Profile 3: Suzy Creamcheese 
--------------------------------- 
A reply from Suzy Creamcheese. 
--------------------------------- 



Found 4 profiles. 
Profile 2: Suzy Yogurt 
--------------------------------- 
No Creamcheese 
This is not Suzy Creamcheese 
Jim. 
--------------------------------- 

看看它是可以用簡單的HTML DOM來完成,因爲我已經知道DOM是如何工作的?或足夠惹上麻煩......我做了不必學習任何已知的語法!