查找文本

因此，有我的代碼：查找文本

<div id="first"> 
<div id="third">Lorem</div> 
    Lorem Ipsum Dolorez [...] 
<script></script> 
    .... 

<div id="second"> 
    Lorem Ipsum[...] 
    <a href=""/> 
</div> 
    .... 
</div>

我需要得到Lorem Ipsum Dolorez [...]這是申報單塊div一個塊和script一塊~~兩者之間，並Lorem Ipsum[...]這是在div內，但沒有超鏈接。~~

我試着用simple_html_dom.php，但我不知道該怎麼做。

編輯：這是一個網站 - 我無法更改此代碼。

來源

2013-04-03 G.Z

你要去跟JS或PHP的文本？是否有偏好？ – ezekielDFM

PHP--正如我在最後一行所說的，「我嘗試過使用simple_html_dom.php' –

這些id屬性是否真的存在，或者只是爲了澄清而添加的？ – complex857

您可以選擇與DOM library和XPath這些節點：（解釋在內嵌評論）

$html = ' 
    <div id="first"> 
<div id="third">Lorem</div> 
    Lorem Ipsum Dolorez [...] 
    <script></script> 
    this never gets picked up 
    <div id="second"> 
    Lorem Ipsum[...] 
     <a href=""></a> 
     <span> this span is extraced since its not an anchor element </span> 
    </div> 
    </div>'; 

$doc = new DOMDocument; 
$doc->loadHTML($html); 

$xpath = new DOMXPath($doc); 
$first_lorem = $xpath->query('//div[@id="first"]/div[@id="third"]/following-sibling::text()[following::script]'); 
// first, find the div#first and inside that a div#third ... 
// ... and take text node siblings of that div ... 
// ... if those siblings have a script node following them (so if there's a <script> after them) 

$first_lorem_html = ''; 
// loop the results and concat the html output 
foreach ($first_lorem as $node) { 
    $first_lorem_html .= $doc->saveHTML($node); 
} 
print $first_lorem_html; 

// get the every child of div#second except the ones named 'a' 
$second_lorem = $xpath->query('//div[@id="second"]/node()[name() != "a"]'); 
$second_lorem_html = ''; 
foreach ($second_lorem as $node) { 
    $second_lorem_html .= $doc->saveHTML($node); 
} 
print $second_lorem_html;

來源

2013-04-03 18:32:50 complex857

非常感謝。你讓我今天一整天都感覺很好。感謝代碼的解釋，特別是第一個。 –

嘗試使用strip_tags php函數。例如：

echo strip_tags('<div id="second">Lorem Ipsum[...]<a href=""/></div>');

Lorem存有[...]

http://php.net/manual/en/function.strip-tags.php

來源

2013-04-03 17:31:44 ezekielDFM

每simple_html_dom參考：http://simplehtmldom.sourceforge.net/

你可以做這樣的事情：

$html->find('div[id=third]', 0)->plaintext

來源

2013-04-03 18:36:37 MMCACHRAN

回答

相關問題