SimpleXML vs DOMDocument性能

我使用SimpleXML Class構建RSS解析器，我在想如果使用DOMDocument類會提高解析器的速度。我正在解析一個至少有1000行的rss文檔，並且使用了幾乎所有1000行數據。我正在尋找將花費最少時間完成的方法。SimpleXML vs DOMDocument性能

來源

2012-02-20 mhlas7

SimpleXML和DOMDocument都使用相同的解析器（libxml2），所以它們之間的解析差是可忽略的。

這是很容易驗證：

function time_load_dd($xml, $reps) { 
    // discard first run to prime caches 
    for ($i=0; $i < 5; ++$i) { 
     $dom = new DOMDocument(); 
     $dom->loadXML($xml); 
    } 
    $start = microtime(true); 
    for ($i=0; $i < $reps; ++$i) { 
     $dom = new DOMDocument(); 
     $dom->loadXML($xml); 
    } 
    $stop = microtime(true) - $start; 
    return $stop; 
} 
function time_load_sxe($xml, $reps) { 
    for ($i=0; $i < 5; ++$i) { 
     $sxe = simplexml_load_string($xml); 
    } 
    $start = microtime(true); 
    for ($i=0; $i < $reps; ++$i) { 
     $sxe = simplexml_load_string($xml); 
    } 
    $stop = microtime(true) - $start; 
    return $stop; 
} 


function main() { 
    // This is a 1800-line atom feed of some complexity. 
    $url = 'http://feeds.feedburner.com/reason/AllArticles'; 
    $xml = file_get_contents($url); 
    $reps = 10000; 
    $methods = array('time_load_dd','time_load_sxe'); 
    echo "Time to complete $reps reps:\n"; 
    foreach ($methods as $method) { 
     echo $method,": ",$method($xml,$reps), "\n"; 
    } 
} 
main();

在我的機器，我得到基本沒有區別：

Time to complete 10000 reps: 
time_load_dd: 17.725028991699 
time_load_sxe: 17.416455984116

這裏真正的問題是你使用，你是用做的哪些算法數據。 1000行不是一個大的XML文檔。你的減速將不會在內存使用或分析速度上，而是在你的應用程序邏輯中。

來源

2012-02-20 23:33:27

我還補充說，不僅*解析*是相同的，但最常見的任務也提供幾乎相同的性能。如果你的應用程序運行緩慢，那麼其他應用程序將會很慢。 – 2012-02-20 23:48:17

謝謝，這是一個非常好的示範。我還有一個問題。如果我只想從Feed中獲取一個標籤的值，該怎麼辦？哪一個會更快，或者與上面的時間差別微不足道？謝謝！ – mhlas7 2012-02-21 03:44:16

您需要更具體地瞭解您正在進行基準測試。（例如，DOM/SXE沒有「標籤」！）有多種獲取元素的方法 - 通過遍歷或XPath，而XPath有多個等效的XPath，它們的表現會有所不同。你爲什麼不進行基準測試？更重要的是，你是否曾經遇到*需要優化？很可能你根本不需要擔心速度，並且過早地進行微觀優化。 – 2012-02-21 15:35:05

-1

那麼，我遇到了DomDocument和SimpleXML之間巨大的性能差異。我有〜15 MB，像這樣大約50 000元大XML文件：

... 
<ITEM> 
    <Product>some product code</Product> 
    <Param>123</Param> 
    <TextValue>few words</TextValue> 
</ITEM> 
...

我只需要「讀」這些值，並將其保存在PHP數組。起初，我試圖DomDocument ......

$dom = new DOMDocument(); 
$dom->loadXML($external_content); 
$root = $dom->documentElement; 

$xml_param_values = $root->getElementsByTagName('ITEM'); 
foreach ($xml_param_values as $item) { 
    $product_code = $item->getElementsByTagName('Product')->item(0)->textContent; 
    // ... some other operation 
}

該腳本去世後，60秒內最大的執行時間超過錯誤。只有15000個50k的項目被解析。

所以我重寫了代碼SimpleXML版本：

$xml = new SimpleXMLElement($external_content); 
foreach($xml->xpath('ITEM') as $item) { 
    $product_code = (string) $item->Product; 
    // ... some other operation 
}

1秒一切都完成後。

我不知道這些函數是如何在PHP中內部實現的，但在我的應用程序（以及我的XML結構）中，DomDocument和SimpleXML之間確實存在巨大的性能差異。

來源

2015-03-10 13:52:19 Marek

使用xpath和通過標籤獲取元素存在巨大差異。根據這些腳本的行爲來判斷，xpath函數實際上不會一次恢復所有元素，而是爲您提供一個迭代器對象 - 這將顯着更輕且更快。與加載文件相同 - 您可以一次加載大文件或每行讀取一行。由於讀取每一行不需要將所有內容一次加載到內存中，它的性能會更好。 – SteveB 2015-06-25 08:36:34

上面的評論是正確的，它不是關於DomDocument與SimpleXML，它關於你如何迭代。相反，將DomDocument上的迭代從getElementsByTagName更改爲DOMXPath，可以使其一樣快速。我對大約120.000個元素文件的測試證實了這一點 – BobbyTables 2015-10-28 11:53:17

SimpleXML vs DOMDocument性能

回答

相關問題