2012-01-21 79 views
0

我有以下工作示例檢索返回SimpleXMLElement對象特定的維基百科頁面:如何分解和分析特定維基百科文本

ini_set('user_agent', '[email protected]'); 
$doc = New DOMDocument(); 
$doc->load('http://en.wikipedia.org/w/api.php?action=parse&page=Main%20Page&format=xml'); 

$xml = simplexml_import_dom($doc); 

print '<pre>'; 
print_r($xml); 
print '</pre>'; 

將返回:

SimpleXMLElement Object 
(
    [parse] => SimpleXMLElement Object 
     (
      [@attributes] => Array 
       (
        [title] => Main Page 
        [revid] => 472210092 
        [displaytitle] => Main Page 
       ) 

      [text] => <body><table id="mp-topbanner" style="width: 100%;"... 

傻問題/頭腦空白。我想要做的是捕獲$ xml-> parse->文本元素並反過來解析它。所以最終我想要返回的是以下對象;我如何實現這一目標?

SimpleXMLElement Object 
(
    [body] => SimpleXMLElement Object 
     (
      [table] => SimpleXMLElement Object 
       (
        [@attributes] => Array 
         (
          [id] => mp-topbanner 
          [style] => width:100% ... 
+0

也許您在尋找'$ doc-> loadHTMLFile('http://en.wikipedia.org/');'? –

回答

1

抓住一個新鮮的茶,吃了香蕉後,這裏的解決方案,我想出來的:

ini_set('user_agent','[email protected]'); 
$doc = new DOMDocument(); 
$doc->load('http://en.wikipedia.org/w/api.php?action=parse&page=Main%20Page&format=xml'); 
$nodes = $doc->getElementsByTagName('text'); 

$str = $nodes->item(0)->nodeValue; 

$html = new DOMDocument(); 
$html->loadHTML($str); 

然後,這可以讓我獲得一個元素的值,這是我後。例如:

echo "Some value: "; 
echo $html->getElementById('someid')->nodeValue;