2013-04-03 31 views
0

因此,有我的代碼:查找文本

<div id="first"> 
<div id="third">Lorem</div> 
    Lorem Ipsum Dolorez [...] 
<script></script> 
    .... 

<div id="second"> 
    Lorem Ipsum[...] 
    <a href=""/> 
</div> 
    .... 
</div> 

我需要得到Lorem Ipsum Dolorez [...]這是申報單塊div一個塊和script一塊兩者之間,並Lorem Ipsum[...]這是在div內,但沒有超鏈接。

我試着用simple_html_dom.php,但我不知道該怎麼做。

編輯:這是一個網站 - 我無法更改此代碼。

+0

你要去跟JS或PHP的文本?是否有偏好? – ezekielDFM

+0

PHP--正如我在最後一行所說的,「我嘗試過使用simple_html_dom.php' –

+0

這些id屬性是否真的存在,或者只是爲了澄清而添加的? – complex857

回答

1

您可以選擇與DOM library和XPath這些節點:(解釋在內嵌評論)

$html = ' 
    <div id="first"> 
<div id="third">Lorem</div> 
    Lorem Ipsum Dolorez [...] 
    <script></script> 
    this never gets picked up 
    <div id="second"> 
    Lorem Ipsum[...] 
     <a href=""></a> 
     <span> this span is extraced since its not an anchor element </span> 
    </div> 
    </div>'; 

$doc = new DOMDocument; 
$doc->loadHTML($html); 

$xpath = new DOMXPath($doc); 
$first_lorem = $xpath->query('//div[@id="first"]/div[@id="third"]/following-sibling::text()[following::script]'); 
// first, find the div#first and inside that a div#third ... 
// ... and take text node siblings of that div ... 
// ... if those siblings have a script node following them (so if there's a <script> after them) 

$first_lorem_html = ''; 
// loop the results and concat the html output 
foreach ($first_lorem as $node) { 
    $first_lorem_html .= $doc->saveHTML($node); 
} 
print $first_lorem_html; 

// get the every child of div#second except the ones named 'a' 
$second_lorem = $xpath->query('//div[@id="second"]/node()[name() != "a"]'); 
$second_lorem_html = ''; 
foreach ($second_lorem as $node) { 
    $second_lorem_html .= $doc->saveHTML($node); 
} 
print $second_lorem_html; 
+0

非常感謝。你讓我今天一整天都感覺很好。感謝代碼的解釋,特別是第一個。 –