2017-07-15 32 views
0

標籤PHP的XPath查詢的標籤從HREF得到規範字符

<a href="http://www.example.com/5809/book>Origin of Species</a> 
<a href="http://www.example.com/author/id=124>Darwin</a> 
<a href="http://www.example.com/196/genres>Science, Biology</a> 
<span class="Xbkznofv">24/11/1859</span> 

我該如何在標籤上使用XPath查詢ID號從HREF?

欲導致這樣例如:

5809,124,196,24/11/1859

PHP代碼

$url = 'http://www.example.com/Books/Default.aspx'; 
libxml_use_internal_errors(true); 
$doc = new DOMDocument(); 
$doc->loadHTMLFile($url); 
$xpath = new DOMXpath($doc); 

$elements1 = $xpath->query('//a[contains(@href, "www.example.com/Book/")]'); 
$elements2 = $xpath->query('//a[contains(@href, "www.example.com/author/id=")]'); 
$elements3 = $xpath->query('//a[contains(@href, "www.example.com/genres/")]'); 
$elements4 = $xpath->query('//span[contains(@class, "")]'); 

if (!is_null($elements)) { 
    foreach ($elements as $element) { 
echo "<br/>". ""; 

$nodes = $element->childNodes; 
foreach ($nodes as $node) { 
    echo $node->nodeValue. "\n"; 
    } 
    } 
} 

回答

0

Xpath的1.0有一些有限的字符串操作,但是在某些時候,讀取屬性並使用正則表達式提取值會更加容易。

不過這裏是使用XPath只是一個例子:

$html = <<<'HTML' 
<a href="http://www.example.com/5809/book">Origin of Species</a> 
<a href="http://www.example.com/author/id=124">Darwin</a> 
<a href="http://www.example.com/196/genres">Science, Biology</a> 
<span class="Xbkznofv">24/11/1859</span> 
HTML; 

$document = new DOMDocument(); 
$document->loadHtml($html); 
$xpath = new DOMXpath($document); 

$data = [ 
    'book_title' => $xpath->evaluate(
    'string(//a[contains(@href, "www.example.com") and contains(@href, "/book")])' 
), 
    'book_id' => $xpath->evaluate(
    'substring-before(
     substring-after(
     //a[contains(@href, "www.example.com") and contains(@href, "/book")]/@href, 
     "www.example.com/" 
    ), 
     "/" 
    )' 
), 
    'author_id' => $xpath->evaluate(
    'substring-after(
     //a[contains(@href, "www.example.com/author/id=")]/@href, 
     "/id=" 
    )' 
) 
]; 

var_dump($data); 

輸出:

array(3) { 
    ["book_title"]=> 
    string(17) "Origin of Species" 
    ["book_id"]=> 
    string(4) "5809" 
    ["author_id"]=> 
    string(3) "124" 
} 

這些表達將只與DOMXpath::evaluate()工作,DOMXpath::query()只能返回節點列表。

大多數情況下,您將使用一個表達式來獲取節點列表,迭代它們並使用多個表達式來獲取值。下面是一個簡單的例子:

$html = <<<'HTML' 
<div class="book"> 
    <a href="#1">Origin of Species</a> 
</div> 
<div class="book"> 
    <a href="#2">On the Shoulders of Giants</a> 
</div> 
HTML; 

$document = new DOMDocument(); 
$document->loadHtml($html); 
$xpath = new DOMXpath($document); 

foreach ($xpath->evaluate('//div[@class="book"]') as $book) { 
    var_dump(
    $xpath->evaluate('string(.//a)', $book), 
    $xpath->evaluate('string(.//a/@href)', $book) 
); 
} 

輸出:

string(17) "Origin of Species" 
string(2) "#1" 
string(26) "On the Shoulders of Giants" 
string(2) "#2" 
+0

非常感謝你。還有一件事...如何使這個例子在foreach循環函數中爲多個書籍,作者... 和結果是以逗號分隔的格式? –

+0

DOMXpath :: evaluate()的第二個參數是上下文節點。你需要做一些像'$ xpath-> evaluate('string(.// a)',$ outerNode)''。 './/是當前上下文節點的後代。對於CSV寫入,請查找'fputcsv()'。 – ThW

+0

我剛開始學習。如果你可以寫一個簡單的例子...如果你不忙?這對我很重要。 –