如何分解和分析特定維基百科文本

我有以下工作示例檢索返回SimpleXMLElement對象特定的維基百科頁面：如何分解和分析特定維基百科文本

ini_set('user_agent', '[email protected]'); 
$doc = New DOMDocument(); 
$doc->load('http://en.wikipedia.org/w/api.php?action=parse&page=Main%20Page&format=xml'); 

$xml = simplexml_import_dom($doc); 

print '<pre>'; 
print_r($xml); 
print '</pre>';

將返回：

SimpleXMLElement Object 
(
    [parse] => SimpleXMLElement Object 
     (
      [@attributes] => Array 
       (
        [title] => Main Page 
        [revid] => 472210092 
        [displaytitle] => Main Page 
       ) 

      [text] => <body><table id="mp-topbanner" style="width: 100%;"...

傻問題/頭腦空白。我想要做的是捕獲$ xml-> parse->文本元素並反過來解析它。所以最終我想要返回的是以下對象;我如何實現這一目標？

SimpleXMLElement Object 
(
    [body] => SimpleXMLElement Object 
     (
      [table] => SimpleXMLElement Object 
       (
        [@attributes] => Array 
         (
          [id] => mp-topbanner 
          [style] => width:100% ...

來源

2012-01-21 Michael Pasqualone

也許您在尋找'$ doc-> loadHTMLFile（'http://en.wikipedia.org/'）;'？ –

抓住一個新鮮的茶，吃了香蕉後，這裏的解決方案，我想出來的：

ini_set('user_agent','[email protected]'); 
$doc = new DOMDocument(); 
$doc->load('http://en.wikipedia.org/w/api.php?action=parse&page=Main%20Page&format=xml'); 
$nodes = $doc->getElementsByTagName('text'); 

$str = $nodes->item(0)->nodeValue; 

$html = new DOMDocument(); 
$html->loadHTML($str);

然後，這可以讓我獲得一個元素的值，這是我後。例如：

echo "Some value: "; 
echo $html->getElementById('someid')->nodeValue;

來源

2012-01-21 02:52:10

如何分解和分析特定維基百科文本

回答

相關問題