PHP的XPath查詢的標籤從HREF得到規範字符

標籤PHP的XPath查詢的標籤從HREF得到規範字符

<a href="http://www.example.com/5809/book>Origin of Species</a> 
<a href="http://www.example.com/author/id=124>Darwin</a> 
<a href="http://www.example.com/196/genres>Science, Biology</a> 
<span class="Xbkznofv">24/11/1859</span>

我該如何在標籤上使用XPath查詢ID號從HREF？

欲導致這樣例如：

5809，124，196，24/11/1859

PHP代碼

$url = 'http://www.example.com/Books/Default.aspx'; 
libxml_use_internal_errors(true); 
$doc = new DOMDocument(); 
$doc->loadHTMLFile($url); 
$xpath = new DOMXpath($doc); 

$elements1 = $xpath->query('//a[contains(@href, "www.example.com/Book/")]'); 
$elements2 = $xpath->query('//a[contains(@href, "www.example.com/author/id=")]'); 
$elements3 = $xpath->query('//a[contains(@href, "www.example.com/genres/")]'); 
$elements4 = $xpath->query('//span[contains(@class, "")]'); 

if (!is_null($elements)) { 
    foreach ($elements as $element) { 
echo "<br/>". ""; 

$nodes = $element->childNodes; 
foreach ($nodes as $node) { 
    echo $node->nodeValue. "\n"; 
    } 
    } 
}

來源

2017-07-15 Robin Vlaar

Xpath的1.0有一些有限的字符串操作，但是在某些時候，讀取屬性並使用正則表達式提取值會更加容易。

不過這裏是使用XPath只是一個例子：

$html = <<<'HTML' 
<a href="http://www.example.com/5809/book">Origin of Species</a> 
<a href="http://www.example.com/author/id=124">Darwin</a> 
<a href="http://www.example.com/196/genres">Science, Biology</a> 
<span class="Xbkznofv">24/11/1859</span> 
HTML; 

$document = new DOMDocument(); 
$document->loadHtml($html); 
$xpath = new DOMXpath($document); 

$data = [ 
    'book_title' => $xpath->evaluate(
    'string(//a[contains(@href, "www.example.com") and contains(@href, "/book")])' 
), 
    'book_id' => $xpath->evaluate(
    'substring-before(
     substring-after(
     //a[contains(@href, "www.example.com") and contains(@href, "/book")]/@href, 
     "www.example.com/" 
    ), 
     "/" 
    )' 
), 
    'author_id' => $xpath->evaluate(
    'substring-after(
     //a[contains(@href, "www.example.com/author/id=")]/@href, 
     "/id=" 
    )' 
) 
]; 

var_dump($data);

輸出：

array(3) { 
    ["book_title"]=> 
    string(17) "Origin of Species" 
    ["book_id"]=> 
    string(4) "5809" 
    ["author_id"]=> 
    string(3) "124" 
}

這些表達將只與DOMXpath::evaluate()工作，DOMXpath::query()只能返回節點列表。

大多數情況下，您將使用一個表達式來獲取節點列表，迭代它們並使用多個表達式來獲取值。下面是一個簡單的例子：

$html = <<<'HTML' 
<div class="book"> 
    <a href="#1">Origin of Species</a> 
</div> 
<div class="book"> 
    <a href="#2">On the Shoulders of Giants</a> 
</div> 
HTML; 

$document = new DOMDocument(); 
$document->loadHtml($html); 
$xpath = new DOMXpath($document); 

foreach ($xpath->evaluate('//div[@class="book"]') as $book) { 
    var_dump(
    $xpath->evaluate('string(.//a)', $book), 
    $xpath->evaluate('string(.//a/@href)', $book) 
); 
}

輸出：

string(17) "Origin of Species" 
string(2) "#1" 
string(26) "On the Shoulders of Giants" 
string(2) "#2"

來源

2017-07-17 09:00:27 ThW

非常感謝你。還有一件事...如何使這個例子在foreach循環函數中爲多個書籍，作者... 和結果是以逗號分隔的格式？ –

DOMXpath :: evaluate（）的第二個參數是上下文節點。你需要做一些像'$ xpath-> evaluate（'string（.// a）'，$ outerNode）''。 './/是當前上下文節點的後代。對於CSV寫入，請查找'fputcsv（）'。 – ThW

我剛開始學習。如果你可以寫一個簡單的例子...如果你不忙？這對我很重要。 –

PHP的XPath查詢的標籤從HREF得到規範字符

回答

相關問題