如何獲得通過的XmlReader

我知道如何通過DOM文檔節點的路徑節點路徑：如何獲得通過的XmlReader

$dom = new DOMDocument; 

$dom->loadXML('<fruits><fruit><name>Apple</name><name>Banana</name></fruit></fruits>'); 

foreach($dom->getElementsByTagName('*') as $node){ 
    // e.g. $node->getNodePath(); 
};

我的問題是：我需要得到所有節點的時間+數它發生在一個文件，而且我有很大的文件。

示例文件是這樣的：

<products> 
    <product> 
     <properties> 
      <property></property> 
      <property></property> 
     </properties> 
    </product> 
    ... 
</products>

節點<products>出現1周時間（因爲它是根節點）
節點<product>出現60 000倍
節點<property>出現120 000次（每個產品2次）

警告：由於每個文件不同，我沒有根節點的名稱！在這個例子中，它是<products>，但它可能是別的東西）。爲了得到根節點的名字，我用這個代碼：

$simpleXML = simplexml_load_file(<-- filename goes here -->); 
$root = $simpleXML->getName();

我發現這個倉庫：https://github.com/dkrnl/SimpleXMLReader

然後我用這個代碼：

$reader = new SimpleXMLReader; 

$reader->open(<!-- filename goes here -->); 

$reader->registerCallback($root,function($reader){ 

    $xml = $reader->expandDomDocument(); 

    foreach($xml->childNodes as $child){ 

     list($nodes,$counter) = getChildrenOfAllNodes($child,$nodes,$counter); 

    }; 

}; 

$reader->parse(); 

$reader->close();

這是我的「getChildrenOfAllNodes 「 - 功能：

function getChildrenOfAllNodes(DOMNOde $node,$nodes,$counter){ 

     foreach($node->childNodes as $child){ 

      if($child->hasChildNodes()){ 

       list($nodes,$counter) = getChildrenOfAllNodes($child,$nodes,$counter); 

      }; 

      if(strpos($child->nodeName,'#') === false){ 

       if(array_key_exists($child->nodeName,$nodes)){ 

        $nodes[$child->nodeName]['count'] += 1; 

        $nodes[$child->nodeName]['path'] = $child->getNodePath(); 

       }else{ 

        $nodes[$child->nodeName] = array(
         'name' => $child->nodeName, 
         'path' => $child->getNodePath(), 
         'count' => 1 
        ); 

       } 

       $counter++; 

      }; 

     }; 

     return array($nodes,$counter); 

    };

它可以處理大約1000個節點的文件，但文件的m超過1000個節點，它不斷處理。

我的問題是：是否有一個（更好的）解決方案（比這個更好）獲取xml文件中的所有名稱+節點路徑以查看非常大的文件？

謝謝！

來源

2017-08-10 Sam Leurs

XMLReader是要走的路。但是你不應該展開整個文檔（這是這個例子中的事情）。

您使用XMLReader:read()和XMLReader:next()導航到代表您記錄的節點（product）。將該節點展開到DOM中，並使用DOM方法/ xpath獲取數據，DOMNode::getNodePath()以獲取部分節點路徑。

使用外部結構手動指定路徑的前綴，例如根據它改變它。

$reader = new XMLReader(); 
$reader->open('php://stdin'); 

$document = new DOMDocument(); 
$xpath= new DOMXpath($document); 

while ($reader->read() and $reader->localName != 'fruit') { 
} 

if ($reader->localName == 'fruit') { 
    $counter = 0; 
    do { 
    $fruit = $reader->expand($document); 
    $counter++; 
    foreach ($xpath->evaluate('name', $fruit) as $name) { 
     var_dump(
     [ 
      'name' => $name->textContent, 
      'local_path' => $name->getNodePath(), 
      'path' => preg_replace(
      '(^/(\w+))', '/fruits$2['.$counter.']', $name->getNodePath() 
     ) 
     ] 
    ); 
    } 
    } while ($reader->next('fruit')); 
}

輸出：

array(3) { 
    ["name"]=> 
    string(5) "Apple" 
    ["local_path"]=> 
    string(14) "/fruit/name[1]" 
    ["path"]=> 
    string(18) "/fruits[1]/name[1]" 
} 
array(3) { 
    ["name"]=> 
    string(6) "Banana" 
    ["local_path"]=> 
    string(14) "/fruit/name[2]" 
    ["path"]=> 
    string(18) "/fruits[1]/name[2]" 
}

如果你不知道的節點本身，你將有使用結構進行迭代，檢查節點類型和發現節點名稱保存到變量。

$nodeNames = [ 
    'list' => NULL, 
    'item' => NULL 
]; 
while ($reader->read()) { 
    if ($reader->nodeType == XML_ELEMENT_NODE) { 
    if (NULL === $nodeNames['list']) { 
     $nodeNames['list'] = $reader->localName; 
    } elseif (NULL === $nodeNames['item']) { 
     $nodeNames['item'] = $reader->localName; 
    } else { 
     break; 
    } 
    } 
} 

var_dump($nodeNames); 
if ($reader->nodeType == XML_ELEMENT_NODE && $reader->localName == $nodeNames['item']) { 
    $counter = 0; 
    do { 
    $item = $reader->expand($document); 
    var_dump($item->getNodePath()); 
    } while ($reader->next($nodeNames['item'])); 
}

來源

2017-08-10 19:05:09 ThW

如何獲得通過的XmlReader

回答

相關問題