我目前有一段PHP代碼，基本上從xml文件中提取數據，並使用$products = new SimpleXMLElement($xmlString);創建簡單的xml對象然後，我使用for循環遍歷此代碼，在該循環中，我爲XML文檔中的每個產品設置產品詳細信息。然後它被保存到一個mySql數據庫。大型PHP for循環與SimpleXMLElement非常緩慢：內存問題？

在運行此腳本時，產品添加頻率降低，直到它們在達到最大值之前最終停止。我嘗試過間隔運行垃圾收集，但無濟於事。以及取消設置似乎不起作用的各種變量。代碼

部分如下圖所示：

<?php 
$servername = "localhost"; 
$username = "database.database"; 
$password = "demwke"; 
$database = "databasename"; 
$conn = new mysqli($servername, $username, $password, $database); 

$file = "large.xml"; 
$xmlString = file_get_contents($file); 
$products = new SimpleXMLElement($xmlString); 
unset($xmlString, $file); 
$total = count($products->datafeed[0]); 

echo 'Starting<br><br>'; 

for($i=0;$i<$total;$i++){ 
    $id = $products->datafeed->prod[$i]['id']; 
etc etc 
    $sql = "INSERT INTO products (id, name, uid, cat, prodName, brand, desc, link, imgurl, price, subcat) VALUES ('$id', '$store', '$storeuid', '$category', '$prodName', '$brand', '$prodDesc', '$link', '$image', '$price', '$subCategory')"; 
} 
echo '<br>Finished'; 
?>

的PHP變量使用類似的線與$ ID的所有定義，但刪除，以便更容易閱讀。

有關我可以做什麼/閱讀以完成此任務的任何想法？只要最終完成，所花費的時間對我來說並不重要。

來源

2015-04-20 Adam Moseley

你能解釋一下「減少頻率直到它們最終停止嗎？也許添加一段XML結構來說明？ –

我有另一頁用於檢查數據庫中的總行數。之後的第一個5秒約4000，然後再過5秒約2000新增自此以來。然後這會減少，直到它僅爲每秒10個左右。 –

可能的欺騙：http://stackoverflow.com/questions/18518602/stream-parse-4-gb-xml-file-in-php –

更新：從來沒有使用SimpleXML索引，除非您有真的很少對象。改爲使用foreach。：

// Before, with [index]: 
for ($i=0;$i<$total;$i++) { 
    $id = $products->datafeed->prod[$i]['id']; 
    ... 

// After, with foreach(): 
$i = 0; 
foreach ($products->datafeed->prod as $prod) { 
    $i++; // Remove if you don't actually need $i 
    $id = $prod['id']; 
    ...

一般而言，...->node[$i]將訪問陣列node[]並朗讀所有到所需的索引，以便迭代所述節點數組不是O（N），但O（N ）。沒有解決方法，因爲不能保證當您訪問項目K時，您剛剛訪問了項目K-1（以遞歸方式等等）。 foreach保存指針，從而在o（N）中工作。

出於同樣的原因，它可能是有利的foreach來整個陣列，即使你真的需要只有少數，知道的東西（除非他們是少數，非常靠近該陣列的開始）：

$a[0] = $products->datafeed->prod[15]['id']; 
    ... 
    $a[35] = $products->datafeed->prod[1293]['id']; 

// After, with foreach(): 
$want = [ 15, ... 1293 ]; 
$i = 0; 
foreach ($products->datafeed->prod as $prod) { 
    if (!in_array(++$i, $want)) { 
     continue; 
    } 
    $a[] = $prod['id']; 
}

您應該首先驗證增加的延遲是由MySQLi還是由XML處理引起的。您可以從循環中刪除（註釋掉）SQL查詢執行，而不是其他任何事情，以驗證速度（現在認爲它會更高...... :-)）現在保持不變，或者顯示相同的減少。

我懷疑是XML處理是罪魁禍首，在這裏：

for($i=0;$i<$total;$i++){ 
    $id = $products->datafeed->prod[$i]['id'];

...在這裏你訪問一個指數，這是越來越遠成SimpleXMLObject。這可能會遇到Schlemiel the Painter的問題。

直接回答你的問題，「我怎樣才能完成循環，不管時間如何」，都是「增加內存限制和最大執行時間」。

爲了提高性能，您可以使用不同的接口進料對象：

$i = -1; 
foreach ($products->datafeed->prod as $prod) { 
    $i++; 
    $id = $prod['id']; 
    ... 
}

做實驗

我用這個小程序來讀取大型XML和重複的內容：

// Stage 1. Create a large XML. 
$xmlString = '<?xml version="1.0" encoding="UTF-8" ?>'; 
$xmlString .= '<content><package>'; 
for ($i = 0; $i < 100000; $i++) { 
    $xmlString .= "<entry><id>{$i}</id><text>The quick brown fox did what you would expect</text></entry>"; 
} 
$xmlString .= '</package></content>'; 

// Stage 2. Load the XML. 
$xml = new SimpleXMLElement($xmlString); 

$tick = microtime(true); 
for ($i = 0; $i < 100000; $i++) { 
    $id = $xml->package->entry[$i]->id; 
    if (0 === ($id % 5000)) { 
     $t = microtime(true) - $tick; 
     print date("H:i:s") . " id = {$id} at {$t}\n"; 
     $tick = microtime(true); 
    } 
}

在生成XML之後，一個循環會解析它並打印出需要多少元才能迭代5000個元素。爲了驗證它確實是時間增量，日期也被打印出來。增量應該近似於時間戳之間的時間差。

21:22:35 id = 0 at 2.7894973754883E-5 
21:22:35 id = 5000 at 0.38135695457458 
21:22:38 id = 10000 at 2.9452259540558 
21:22:44 id = 15000 at 5.7002019882202 
21:22:52 id = 20000 at 8.0867099761963 
21:23:02 id = 25000 at 10.477082967758 
21:23:15 id = 30000 at 12.81209897995 
21:23:30 id = 35000 at 15.120756149292

所以這是發生了什麼：處理XML陣列變爲慢。

這主要是相同的程序中使用的foreach：

// Stage 1. Create a large XML. 
$xmlString = '<?xml version="1.0" encoding="UTF-8" ?>'; 
$xmlString .= '<content><package>'; 
for ($i = 0; $i < 100000; $i++) { 
    $xmlString .= "<entry><id>{$i}</id><text>The quick brown fox did ENTRY {$i}.</text></entry>"; 
} 
$xmlString .= '</package></content>'; 

// Stage 2. Load the XML. 
$xml = new SimpleXMLElement($xmlString); 

$i  = 0; 
$tick = microtime(true); 
foreach ($xml->package->entry as $data) { 
    // $id = $xml->package->entry[$i]->id; 
    $id = $data->id; 
    $i++; 
    if (0 === ($id % 5000)) { 
     $t = microtime(true) - $tick; 
     print date("H:i:s") . " id = {$id} at {$t} ({$data->text})\n"; 
     $tick = microtime(true); 
    } 
}

的時間現在似乎是恆定的......我說「似乎」，是因爲他們似乎已經由約一萬因素減少，我在獲得可靠的測量方面遇到一些困難。

（不，我不知道，我可能從來沒有使用大型XML數組索引）。

21:33:42 id = 0 at 3.0994415283203E-5 (The quick brown fox did ENTRY 0.) 
21:33:42 id = 5000 at 0.0065329074859619 (The quick brown fox did ENTRY 5000.) 
... 
21:33:42 id = 95000 at 0.0065121650695801 (The quick brown fox did ENTRY 95000.)

來源

2015-04-20 21:07:51 LSerni

感謝這一點，問題是for循環。更改爲foreach允許我在不到一秒的時間內插入55000。 –

您可以檢查以下2個步驟嗎？它可以幫助您。

1) Increase the default PHP execution time from 30 sec to a bigger one. 
    ini_set('max_execution_time', 300000); 

2) If fails please try to execute your code though cron job/back end.

來源

2015-04-20 19:00:58

我以前有過同樣的問題。

將您的大型xml文件分解爲比file1，file2，file3更小的文件，而不是處理它們。

你可以用文本編輯器來分解你的xml文件，它可以打開大文件。當爆炸你的文件時，不要浪費你的時間。

編輯：我找到了一個巨大的XML文件的答案。我認爲這是達到這個目的的最佳答案。 Parsing Huge XML Files in PHP

來源

2015-04-20 19:12:32 hakiko

根據XML結構的複雜程度，這可能不是一個這麼簡單的修復，這就是爲什麼我建議只使用php來跟蹤您的位置並繼續在稍後頁面加載時留下的位置。 –

@JonathanKuhn我編輯了我的答案，請看。 – hakiko

你確定這是XML文件嗎？無需使用文本編輯器剪切，您可以使用XMLReader之類的解析器，然後處理一個主要元素 - 如果XML文件太大（從問題中提供的錯誤信息來看，XML可能不是問題在這裏）。 – hakre

您可以嘗試增加內存限制。如果這不是一個選項，你只需要完成一次，我個人只是把它組裝起來，一次處理5k值。

<?php 
$servername = "localhost"; 
$username = "database.database"; 
$password = "demwke"; 
$database = "databasename"; 
$conn = new mysqli($servername, $username, $password, $database); 

$file = "large.xml"; 
$xmlString = file_get_contents($file); 
$products = new SimpleXMLElement($xmlString); 
unset($xmlString, $file); 

$total = count($products->datafeed[0]); 

//get your starting value for this iteration 
$start = isset($_GET['start'])?(int)$_GET['start']:0; 

//determine when to stop 
//process no more than 5k at a time 
$step = 5000; 
//where to stop, either after our step (max) or the end 
$limit = min($start+$step, $total); 

echo 'Starting<br><br>'; 

//modified loop so $i starts at our start value and stops at our $limit for this load. 
for($i=$start;$i<$limit;$i++){ 
    $id = $products->datafeed->prod[$i]['id']; 
etc etc 
    $sql = "INSERT INTO products (id, name, uid, cat, prodName, brand, desc, link, imgurl, price, subcat) VALUES ('$id', '$store', '$storeuid', '$category', '$prodName', '$brand', '$prodDesc', '$link', '$image', '$price', '$subCategory')"; 
} 

if($limit >= $total){ 
    echo '<br>Finished'; 
} else { 
    echo<<<HTML 
<html><head> 
<meta http-equiv="refresh" content="2;URL=?start={$limit}"> 
</head><body> 
Done processing {$start} through {$limit}. Moving on to next set in 2 seconds. 
</body><html> 
HTML; 
} 
?>

只要這不是你有一個用戶負載（像你的網站的標準訪問者）應該沒有問題。

另一個選擇，你有沒有嘗試正確準備/綁定您的查詢？

來源

2015-04-20 19:22:52

這裏有兩個問題需要解決：

內存

在你閱讀完整的文件到內存的file_get_contents（），並將其解析爲與SimpleXML的一個對象結構的那一刻。這兩個操作都將整個文件加載到內存中。

一個更好的解決方案是使用的XMLReader：

$reader = new XMLReader; 
$reader->open($file); 
$dom = new DOMDocument; 
$xpath = new DOMXpath($dom); 

// look for the first product element 
while ($reader->read() && $reader->localName !== 'product') { 
    continue; 
} 

// while you have an product element 
while ($reader->localName === 'product') { 
    // expand product element to a DOM node 
    $node = $reader->expand($dom); 
    // use XPath to fetch values from the node 
    var_dump(
    $xpath->evaluate('string(@category)', $node), 
    $xpath->evaluate('string(name)', $node), 
    $xpath->evaluate('number(price)', $node) 
); 
    // move to the next product sibling 
    $reader->next('product'); 
}

性能

工作有很多數據需要時間，以串行的方式做這件事，甚至更多。

將腳本移動到命令行可以處理超時。也可以用`set_time_limit（）來增加限制。

另一種選擇是優化插入，收集一些記錄並將它們組合到一個插入。這減少了數據庫服務器上的往返/工作，但消耗更多的內存。你將不得不尋找一個平衡點。

INSERT INTO table 
    (field1, field2) 
VALUES 
    (value1_1, value1_2), 
    (value2_1, value2_2), ...

您甚至可以將SQL寫入文件並使用mysql命令行工具插入記錄。這非常快，但具有安全隱患，因爲您需要使用exec()。

來源

2015-04-21 09:36:29 ThW

大型PHP for循環與SimpleXMLElement非常緩慢：內存問題？

回答

做實驗

內存

性能

相關問題