PowerShell中拋出一個System.OutOfMemoryException上讀取大（50 MB）的XML文檔

我們正在運行以下腳本：PowerShell中拋出一個System.OutOfMemoryException上讀取大（50 MB）的XML文檔

[xml]$products = Get-Content C:\fso\products.xml

和接收以下錯誤：

System.OutOfMemoryException

我們假設這是因爲XML文件很龐大。該解決方案可能涉及一次讀取XML一行。 我們如何處理這個文件？例如，我們如何計算元素的數量？或者，我們如何將元素名稱打印到控制檯窗口？

目前，我們正在研究此鏈接：

http://blogs.technet.com/b/stephap/archive/2009/05/27/choking-on-very-large-xml-files.aspx

的XML結構如下：

同時

<?xml version="1.0" encoding="UTF-8"?> 
    <dataroot xmlns:od="urn:schemas-microsoft-com:officedata" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="Products.xsd" generated="2014-01-21T08:21:41"> 
     <Products> 
      <upc>0000000000001</upc> 
      <description>BASICS $1.00</description> 
      <cost>0.6</cost> 
      <normal_price>1</normal_price> 
      <pricemethod>0</pricemethod> 
      <target_margin>0</target_margin> 
      <department>34</department> 
      <pack>1</pack> 
      <tax>3</tax> 
      <foodstamp>0</foodstamp> 
      <scale>0</scale> 
      <dsd>0</dsd> 
      <modified>2014-01-04T10:23:55</modified> 
      <cost_modified>2012-11-11T11:20:58</cost_modified> 
      <active>1</active> 
      <advertised>0</advertised> 
      <whomodified>170</whomodified> 
      <longdescription>TEAR ISSUE</longdescription> 
      <seconddescription>ROLL START</seconddescription> 
      <discount>1</discount> 
      <wicable>0</wicable> 
      <validage>0</validage> 
      <deleted>0</deleted> 
      <attributes>2056</attributes> 
      <Created>2005-02-16T09:53:00</Created> 
      <CreatedBy>1</CreatedBy> 
      <Points>0</Points> 
     </Products> 
     <Products> 
      <upc>0000000000357</upc> 
      <description>CHARMIN BATHROOM TISSUE</description> 
      <cost>5.81</cost> 
      <normal_price>7.99</normal_price> 
      <pricemethod>0</pricemethod> 
      <target_margin>0</target_margin> 
      <department>4</department> 
      <pack>1</pack> 
      <size>OVERLIMIT</size> 
      <tax>2</tax> 
      <foodstamp>0</foodstamp> 
      <scale>0</scale> 
      <dsd>0</dsd> 
      <modified>2010-06-30T23:55:00</modified> 
      <active>0</active> 
      <advertised>0</advertised> 
      <whomodified>30</whomodified> 
      <longdescription>CHARMIN BATHROOM TISSUE</longdescription> 
      <discount>1</discount> 
      <wicable>0</wicable> 
      <validage>0</validage> 
      <deleted>0</deleted> 
      <attributes>2048</attributes> 
      <Created>2005-02-16T09:53:00</Created> 
      <CreatedBy>1</CreatedBy> 
      <Points>0</Points> 
     </Products>

來源

2014-01-21 Shaun Luttin

我只是想在1.5GB（沒錯，GB）的XML文件中獲取內容的方式。它最終填充了服務器70GB的內存，並繼續在頁面文件上。轉換爲[xml]是不可能的記憶豬... – Wouter

使用XPath查詢這些文檔可能會更好。 XPath通常可以以流模式工作，不需要將整個文檔加載到DOM樹中。

見Select-Xml：

下也要算在一個XML文件中的所有元素：

Select-Xml -Path C:\fso\products.xml -Xpath "count(//*)"

這樣你能夠獲取XML的小片段你以後還是做計算他們。

參見：http://technet.microsoft.com/en-us/library/hh849968.aspx

來源

2014-01-21 17:33:01 jessehouwing

這很容易處理，所以我們將其標記爲答案。 –

@jessehouwing：我想了解更多有關在使用Xpath/Select-Xml時使用流式傳輸的信息。你碰巧有什麼好的資源？ – Wouter

我試過這種方法，在我的1.5GB XML文件上它仍然使用8GB的內存，但大約15分鐘後就完成了。很好的表現;很好的績效！有一種使用流式傳輸的方式可以減少內存使用量... – Wouter

一號線將是一個文件異常緩慢那個大小。

您可以使用Get-Content -Readcount一次處理塊的行（-ReadCount 1000將爲您提供1000行的數組）。

來源

2014-01-21 17:01:48 mjolinor

-readcount參數正是我們需要的。謝謝！ –

運行[xml] $ products = get-content -ReadCount 1 $ xmlPath;我們仍然得到System.OutOfMemoryException。是什麼賦予了？ –

您不能將結果轉換爲[XML]。這隻有在你能夠讀取整個文件時纔有效。您需要以字符串數據的形式讀取它，並使用像-match和-replace這樣的數組運算符來處理它。 – mjolinor

PowerShell中拋出一個System.OutOfMemoryException上讀取大（50 MB）的XML文檔

回答

相關問題