UPDATE
看來你沒有希望有一個平坦的列表,以便即時增加這個具體的例子,所以沒有混亂:
$html = '<div class="par">
<p class="pp">
<span class="dv">1 </span>Blah blah blah blah. <span class="dv">2 </span> Yada
yada yada yada. <span class="dv">3 </span>Foo foo foo foo.
</p>
</div>
<div class="par">
<p class="pp">
<span class="dv">4 </span>Hmm hmm hmm hmm.
</p>
</div>';
$dom = DOMDocument::loadHTML($html);
$finder = new DOMXPath($dom);
// select THE TEXT NODES of all p elements with the class pp
// - note that means its explictly class="pp",
// not that "pp" is anywhere in the class list you may need to change this up depending...
// post additional questions for specific xpath help
$found = $finder->query('//p[@class="pp"]/text()');
$nodes = array();
// simply transform the resulting DOMNodeList into an array
// for easier consumption/manipulation
foreach($found as $textNode) {
$node[] = $textNode->nodeValue;
}
print_r($nodes);
產地:
Array
(
[0] =>
[1] => Blah blah blah blah.
[2] => Yada
yada yada yada.
[3] => Foo foo foo foo.
[4] =>
[5] => Hmm hmm hmm hmm.
)
如果情況總是這麼簡單,我想你可以使用xpath來獲取p.pp.中的子DOMText節點的內容。
$html = '<div class="par">
<p class="pp">
<span class="dv">1 </span>Blah blah blah blah. <span class="dv">2 </span> Yada
yada yada yada. <span class="dv">3 </span>Foo foo foo foo.
</p>
</div>
<div class="par">
<p class="pp">
<span class="dv">4 </span>Hmm hmm hmm hmm.
</p>
</div>';
$dom = DOMDocument::loadHTML($html);
$finder = new DOMXPath($dom);
// select all p elements with the class pp - note that means its explictly class="pp",
// not that "pp" is anywhere in the class list you may need to change this up depending...
// post additional questions for specific xpath help
$found = $finder->query('//p[@class="pp"]');
$nodes = array();
foreach($found as $p) {
// for each p element, pull its text nodes.
$textNodes = $finder->query('text()', $p);
$textStr = '';
// loop over the textNodes and concat them into a single string
foreach ($textNodes as $n) {
$textStr .= $n->nodeValue;
}
// push the compiled string onto the array
$nodes[] = $textStr;
}
print_r($nodes);
這將產生一個結果,如:
Array
(
[0] =>
Blah blah blah blah. Yada
yada yada yada. Foo foo foo foo.
[1] =>
Hmm hmm hmm hmm.
)
如果你真的希望每個文本節點分開,你只需要改變循環:
foreach($found as $p) {
// for each p element, pull its text nodes.
$textNodes = $finder->query('text()', $p);
$textArr = array();
// loop over the textNodes and concat them into a single string
foreach ($textNodes as $n) {
$textArr[] = $n->nodeValue;
}
// push the compiled string onto the array
$nodes[] = $textArr;
}
,這將給你:
Array
(
[0] => Array
(
[0] =>
[1] => Blah blah blah blah.
[2] => Yada
yada yada yada.
[3] => Foo foo foo foo.
)
[1] => Array
(
[0] =>
[1] => Hmm hmm hmm hmm.
)
)
顯然作爲你可以看到它已經抓取了換行符,如果它們不合需要,你可以使用你選擇的數組過濾方法輕鬆地過濾這些換行符。或者你可以查看XPath和DOMDocument設置來調整這一點,IIRC有一些設置處理如何解釋空白(或不),這可能會讓你避免這種情況,但如果你在做其他處理同樣的DOMDocument
實例。
可能是個好主意,告訴我們「兩套標籤」是否適合您的例子。 –
SPAN標籤組之間。但我確實意識到我想要的最後一段文本不會在BETWEEN兩組之間,就在最後一個span標籤之後... – genechunlee
如果情況總是如此簡單,我認爲您可以使用xpath來獲取子DOMText節點的內容在'p.pp'中。 – prodigitalson