2017-01-29 121 views
1

我想解析YouTube的前15個視頻供稿。飼料我試圖解析的摘錄如下所示:如何在PHP中使用SimpleXML輕鬆解析XML文檔和名稱空間?

<entry> 
    <title>The Title</title> 
    <link href="http://example.com" /> 
    <media:thumbnail url="http://example.com/image.png" /> 
    <media:description>The Description</media:description> 
    <media:statistics views="123456" /> 
    <pubDate>29/01/2017</pubDate> 
</entry> 

我無法捕捉到任何使用與<media:開頭的標籤值的。我正在使用下面的代碼來解析數據;註釋行是那些不起作用的行。

foreach ($xml->entry as $val) { 
    echo "<item>".PHP_EOL; 
    echo "<title>".$val->title."</title>".PHP_EOL; 
    echo "<link>".$val->link["href"]."</link>".PHP_EOL; 
    //echo "<image>".$val->media:thumbnail["url"]."</image>".PHP_EOL; 
    //echo "<description>".$val->media:description."</description>".PHP_EOL; 
    //echo "<views>".$val->media:statistics["views"]."</views>".PHP_EOL; 
    echo "<pubDate>".$val->published."</pubDate>".PHP_EOL; 
    echo "</item>".PHP_EOL; 
} 

如何在不設置命名空間的情況下獲得這些標籤的值。在$xml->entry上做var_dump甚至不顯示命名空間元素。是否有更好的內置函數將XML轉換爲數組?

+0

您的XML格式不正確(即無效)。根據XML 1.0中的[W3C名稱空間](https://www.w3.org/TR/REC-xml-names/#ns-using):*名稱空間前綴,除非是xml或xmlns,否則必須在名稱空間聲明屬性*中聲明。所以'media'前綴應該被聲明。 – Parfait

+0

使用DOM + Xpath要困難得多。在DOMXpath實例上註冊自己的前綴並使用DOMXpath :: evaluate()來獲取節點列表和值。 – ThW

+0

我現在沒時間寫完整的答案,但是你要找的方法是[' - > children()'](http://php.net/manual/en/simplexmlelement.children。 PHP)。在你的情況下'$ val-> children('media',true) - > description'可以工作,儘管我建議硬編碼實際的命名空間URI(來自'xmlns:media'屬性)而不是前綴,如果源文檔使用不同的前綴重新生成。 – IMSoP

回答

0

code provided by IMSoP得到我的答案。 PHP的片斷我結束了使用用於將上述鏈接被改編,使用類似於OP的XML:

foreach ($xml->children(NS_ATOM)->entry as $entry) { 
    echo "<item>".PHP_EOL; 
    echo "<title>".$entry->title."</title>".PHP_EOL; 
    echo "<link>".$entry->link->attributes(null)->href."</link>".PHP_EOL; 
    echo "<image>".$entry->children(NS_MEDIA)->group->children(NS_MEDIA)->thumbnail->attributes(null)->url."</image>".PHP_EOL; 
    echo "<description>".$entry->children(NS_MEDIA)->group->children(NS_MEDIA)->description."</description>".PHP_EOL; 
    echo "<guid>".$entry->children(NS_YT)->videoId."</guid>".PHP_EOL; 
    echo "<views>".$entry->children(NS_MEDIA)->group->children(NS_MEDIA)->community->children(NS_MEDIA)->statistics->attributes(null)->views."</views>".PHP_EOL; 
    echo "<pubDate>".$entry->published."</pubDate>".PHP_EOL; 
    echo "</item>".PHP_EOL; 
} 

希望這可以幫助別人的未來。這是迄今爲止我遇到的XML命名空間解析的最簡單的例子。

0

考慮XSLT,XPath的兄弟,因爲您本質上正在轉換原始XML,並不真正解析選擇值。使用XSLT,您不需要foreach循環,並且可以充分處理名稱空間。

事實上如下所示XSLT是最快的前述方法(SimpleXML查詢和XPath評價)使用XML發佈包裹在一個<feed ...>根:

簡單的XML(從@IMSoP)

$time_start = microtime(true); 

$xml = file_get_contents('YoutubeFeed.xml'); 
$document = new SimpleXMLElement($xml); 
define('NS_ATOM', 'http://www.w3.org/2005/Atom'); 
define('NS_MEDIA', 'http://search.yahoo.com/mrss/'); 

foreach ($document->children(NS_ATOM)->entry as $entry) { 
    echo "<item>".PHP_EOL; 
    echo "<title>".$entry->title."</title>".PHP_EOL; 
    echo "<link>".$entry->link->attributes(null)->href."</link>".PHP_EOL; 
    echo "<image>".$entry->children(NS_MEDIA)->thumbnail->attributes()->url."</image>".PHP_EOL; 
    echo "<description>".$entry->children(NS_MEDIA)->description."</description>".PHP_EOL; 
    echo "<guid>".$entry->children(NS_MEDIA)->guid."</guid>".PHP_EOL; 
    echo "<views>".$entry->children(NS_MEDIA)->statistics->attributes()->views."</views>".PHP_EOL; 
    echo "<pubDate>".$entry->published."</pubDate>".PHP_EOL; 
    echo "</item>".PHP_EOL; 
} 

時序

echo "SimpleXML: " . (microtime(true) - $time_start) ."\n"; 
# SimpleXML: 0.0014688968658447 

XPATH(從@ThW)

$time_start = microtime(true); 

$xml = file_get_contents('YoutubeFeed.xml'); 
$document = new DOMDocument(); 
$document->loadXml($xml); 

$xpath = new DOMXpath($document); 
$xpath->registerNamespace('atom', 'http://www.w3.org/2005/Atom'); 
$xpath->registerNamespace('media', 'http://search.yahoo.com/mrss/'); 

foreach ($xpath->evaluate('//atom:entry') as $entry) { 
    echo "<item>".PHP_EOL; 
    echo "<title>". $xpath->evaluate('string(atom:title)', $entry)."</title>".PHP_EOL; 
    echo "<link>". $xpath->evaluate('string(atom:link/@href)', $entry)."</link>".PHP_EOL; 
    echo "<image>". $xpath->evaluate('string(media:thumbnail/@url)', $entry)."</image>".PHP_EOL; 
    echo "<description>". $xpath->evaluate('string(media:description)', $entry)."</description>".PHP_EOL; 
    echo "<guid>". $xpath->evaluate('string(media:guid)', $entry)."</description>".PHP_EOL; 
    echo "<views>".$xpath->evaluate('string(media:statistics/@views)', $entry)."</guid>".PHP_EOL; 
    echo "<pubDate>". $xpath->evaluate('string(atom:pubdate)', $entry)."</views>".PHP_EOL; 
    echo "</item>".PHP_EOL; 
} 

時序

echo "XPATH: " . (microtime(true) - $time_start) ."\n"; 
# XPATH: 0.0012829303741455 

XSLT

$time_start = microtime(true); 

$xml = file_get_contents('YoutubeFeed.xml'); 
$document = new DOMDocument(); 
$document->loadXml($xml); 

$xslstr = '<?xml version="1.0" encoding="UTF-8"?> 
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0" 
       xmlns:atom="http://www.w3.org/2005/Atom" xmlns:media="http://search.yahoo.com/mrss/" 
       exclude-result-prefixes="atom media"> 
<xsl:output version="1.0" encoding="UTF-8" indent="yes" /> 
<xsl:strip-space elements="*"/> 

    <xsl:template match="feed"> 
    <xsl:apply-templates select="atom:entry"/> 
    </xsl:template> 

    <xsl:template match="atom:entry"> 
     <item> 
     <title><xsl:value-of select="atom:title"/></title> 
     <link><xsl:value-of select="atom:link/@href"/></link> 
     <image><xsl:value-of select="atom:thumbnail/@url"/></image> 
     <description><xsl:value-of select="media:description"/></description> 
     <guid><xsl:value-of select="media:guid"/></guid> 
     <views><xsl:value-of select="media:statistics/@views"/></views> 
     <pubDate><xsl:value-of select="atom:pubdate"/></pubDate> 
     </item> 
    </xsl:template> 
</xsl:stylesheet>'; 

$xsl = new DOMDocument; 
$xsl->loadXML($xslstr); 

// Configure the transformer 
$proc = new XSLTProcessor; 
$proc->importStyleSheet($xsl); 

// Transform XML source 
$newXML = $proc->transformToXML($document); 

// Echo string output 
echo $newXML; 

時序

echo "XSLT: " . (microtime(true) - $time_start) ."\n"; 
# XSLT: 0.00098896026611328 

即使有更<entry>節點,複製標籤和孩子500線,XSLT擴展好得多。以下單位以秒爲單位:

# SimpleXML: 0.62154388427734 

# XPATH: 0.68382000923157 

# XSLT: 0.011976957321167 
相關問題