2016-02-12 107 views
0

我要挑從下面的XML標題和YouTube鏈接:爲什麼XPath查詢不起作用?

`<?xml version="1.0" encoding="UTF-8"?><feed  xmlns="http://www.w3.org/2005/Atom"><category term="videos" label="/r/videos"/> <icon>https://www.redditstatic.com/icon.png/</icon><id>/r/videos/.xml</id><link  rel="self" href="https://www.reddit.com/r/videos/.xml"  type="application/atom+xml" /><link rel="alternate" href="https://www.reddit.com/r/videos/" type="text/html" /><logo>https://a.thumbs.redditmedia.com/mtwnduVr0DnrK1o8rpTPi6waLWuPimj_8ntK8i5t890.png</logo><subtitle>A great place for video content of all kinds.</subtitle><title>Videos</title><entry><author><name>/u/LegendaryContent</name><uri>https://www.reddit.com/user/LegendaryContent</uri></author><category term="videos" label="/r/videos"/><content type="html">&lt;table&gt; &lt;tr&gt;&lt;td&gt; &lt;a href=&quot;https://www.reddit.com/r/videos/comments/45crp7/1400_employees_being_laid_off/&quot;&gt; &lt;img src=&quot;https://b.thumbs.redditmedia.com/UR4XFRqoMtj5watvSUrUlEdTYiA1gOv_OxqxtxNyftQ.jpg&quot; alt=&quot;1,400 Employees being laid off&quot; title=&quot;1,400 Employees being laid off&quot; /&gt; &lt;/a&gt; &lt;/td&gt;&lt;td&gt; &amp;#32; submitted by &amp;#32; &lt;a href=&quot;https://www.reddit.com/user/LegendaryContent&quot;&gt; /u/LegendaryContent &lt;/a&gt; &lt;br/&gt; &lt;span&gt;&lt;a href=&quot;https://youtu.be/Y3ttxGMQOrY&quot;&gt;[link]&lt;/a&gt;&lt;/span&gt; &amp;#32; &lt;span&gt;&lt;a href=&quot;https://www.reddit.com/r/videos/comments/45crp7/1400_employees_being_laid_off/&quot;&gt;[comments]&lt;/a&gt;&lt;/span&gt; &lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;</content><id>t3_45crp7</id><link href="https://www.reddit.com/r/videos/comments/45crp7/1400_employees_being_laid_off/" /><updated>2016-02-12T03:22:38+00:00</updated><title>1,400 Employees being laid off</title></entry></feed>` 

我的代碼是在這裏:

<?php 
$videos =""; 
$video_category = "Trending Videos"; 
$url = "https://www.reddit.com/r/videos/.xml"; 
$feed_dom = new domDocument; 
$feed_dom->load($url); 
$feed_dom->preserveWhiteSpace = false; 
$items = $feed_dom->getElementsByTagName('entry'); 

foreach($items as $item){ 
$title = $item->getElementsByTagName('title')->item(0)->nodeValue; 
$desc_table = $item->getElementsByTagName('content')->item(0)->nodeValue; 

$table_dom = new domDocument; 
$table_dom->loadHTML($desc_table); 
$xpath = new DOMXpath($table_dom); 
$table_dom->preserveWhiteSpace = false; 
$yt_link_node = $xpath->query("//table/tr/td[2]/a[2]"); 

foreach($yt_link_node as $yt_link){ 

$yt = $yt_link->getAttribute('href'); 
echo $title; 
echo $yt; 
} 
?> 

出於某種原因,它不工作,我幾乎每天都應用我在google上找到的xpath查詢& stackoverflow。 標題呼應良好,但不是$yt。 你可以選擇我在做什麼錯?

回答

1

這是因爲DOM與您所期望的略有不同。

你解析那裏($ desc_table)通常情況下,HTML具有這樣的結構:

<table> 
    <tr> 
     <td> 
      <a href="https://www.reddit.com/r/videos/comments/..."> 
       <img src="https://b.thumbs.redditmedia.com/....jpg" 
        alt="..." title="..." /> 
      </a> 
     </td> 
     <td> &#32; submitted by &#32; 
      <a href="https://www.reddit.com/user/..."> /u/... </a> 
      <br/> 
      <span> 
       <a href="https://youtu.be/...">[link]</a> 
      </span> 
      &#32; 
      <span> 
       <a href="https://www.reddit.com/r/videos/comments/.../">[comments]</a> 
      </span> 
     </td> 
    </tr> 
</table> 

所以沒有第二錨元素(a),這是第二td元素的直接孩子,因爲第二個(和第三個)錨點包裹在span標記中。

所以,如果你想獲得此鏈接:

   <a href="https://youtu.be/...">[link]</a> 

然後使用這個XPath來代替:

$yt_link_node = $xpath->query("//table/tr/td[2]/span[1]/a");