2017-04-03 50 views
2

我正在用的XmlWriter和Linq2Xml的幫助一些巨大的XML文件(幾個GB)。 這個文件是類型:(C#)如何在不加載或重寫整個文件的情況下修改現有XML文件中的屬性值?

<Table recCount="" recLength=""> 
<Rec recId="1">..</Rec> 
<Rec recId="2">..</Rec> 
.. 
<Rec recId="n">..</Rec> 
</Table> 

我不知道表的RECCOUNTrecLength屬性值,直到我寫的所有內錄製節點,所以我得值寫入這些屬性在最後。

現在我正在寫所有內部錄製節點到一個臨時文件,計算的屬性的值和寫的一切我上面的最終文件中的顯示方式。 (複製一切從臨時文件與所有錄製節點)

我不知道是否有修改這些屬性的值,而無需編寫的東西到另一個文件(像我現在就做)的方式或將整個文檔加載到內存中(由於這些文件的大小,顯然這是不可能的)?

+0

的確是這樣,但你必須要保留一些空間爲這些號碼(你不能「插入」中的文件的字節,你只能覆蓋它們) – xanatos

+0

@xanatos hm,好吧,我想它也會起作用。我該怎麼做? –

+0

你可以改變xml格式嗎?在末尾放置'count'和'length' *元素*。 –

回答

1

大量註釋代碼。其基本思路是,在第一遍我們寫:

<?xml version="1.0" encoding="utf-8"?> 
<Table recCount="$1" recLength="$2"> 
<!--Reserved space:++++++++++++++++--> 
<Rec... 

然後我們回到文件的開頭,我們改寫了前三行:

<?xml version="1.0" encoding="utf-8"?> 
<Table recCount="1000" recLength="150"> 
<!--Reserved space:#############--> 

重要的「貓膩」在這裏是你不能「插入」到一個文件,你只能覆蓋它。因此,我們的「儲備」一些空間的數字(在Reserved space:#############.評論。有很多很多方面我們可以做它...例如,在第一階段,我們可以有:

<Table recCount="    " recLength="   "> 

,然後(XML-合法的,但很醜):

<Table recCount="1000   " recLength="150  "> 

或者我們可以表的>後附加的空間

<Table recCount="" recLength="">     

(有20位>後)

然後:

<Table recCount="1000" recLength="150">    

(現在有13位>

或者,我們可以簡單地添加的空間不<!-- -->在新行...

該代碼:

int maxRecCountLength = 10; // int.MaxValue.ToString().Length 
int maxRecLengthLength = 10; // int.MaxValue.ToString().Length 
int tokenLength = 4; // 4 == $1 + $2, see below what $1 and $2 are 
// Note that the reserved space will be in the form +++++++++++++++++++ 

string reservedSpace = new string('+', maxRecCountLength + maxRecLengthLength - tokenLength); 

// You have to manually open the FileStream 
using (var fs = new FileStream("out.xml", FileMode.Create)) 

// and add a StreamWriter on top of it 
using (var sw = new StreamWriter(fs, Encoding.UTF8, 4096, true)) 
{ 
    // Here you write on your StreamWriter however you want. 
    // Note that recCount and recLength have a placeholder $1 and $2. 
    int recCount = 0; 
    int maxRecLength = 0; 

    using (var xw = XmlWriter.Create(sw)) 
    { 
     xw.WriteWhitespace("\r\n"); 
     xw.WriteStartElement("Table"); 
     xw.WriteAttributeString("recCount", "$1"); 
     xw.WriteAttributeString("recLength", "$2"); 

     // You have to add some white space that will be 
     // partially replaced by the recCount and recLength value 
     xw.WriteWhitespace("\r\n"); 
     xw.WriteComment("Reserved space:" + reservedSpace); 

     // <--------- BEGIN YOUR CODE 
     for (int i = 0; i < 100; i++) 
     { 
      xw.WriteWhitespace("\r\n"); 
      xw.WriteStartElement("Rec"); 

      string str = string.Format("Some number: {0}", i); 
      if (str.Length > maxRecLength) 
      { 
       maxRecLength = str.Length; 
      } 
      xw.WriteValue(str); 

      recCount++; 

      xw.WriteEndElement(); 
     } 
     // <--------- END YOUR CODE 

     xw.WriteWhitespace("\r\n"); 
     xw.WriteEndElement(); 
    } 

    sw.Flush(); 

    // Now we read the first lines to modify them (normally we will 
    // read three lines, the xml header, the <Table element and the 
    // <-- Reserved space: 
    fs.Position = 0; 

    var lines = new List<string>(); 

    using (var sr = new StreamReader(fs, sw.Encoding, false, 4096, true)) 
    { 
     while (true) 
     { 
      string str = sr.ReadLine(); 
      lines.Add(str); 

      if (str.StartsWith("<Table")) 
      { 
       // We read the next line, the comment line 
       str = sr.ReadLine(); 
       lines.Add(str); 
       break; 
      } 
     } 
    } 

    string strCount = XmlConvert.ToString(recCount); 
    string strMaxRecLength = XmlConvert.ToString(maxRecLength); 

    // We do some replaces for the tokens 
    int oldLen = lines[lines.Count - 2].Length; 
    lines[lines.Count - 2] = lines[lines.Count - 2].Replace("=\"$1\"", string.Format("=\"{0}\"", strCount)); 
    lines[lines.Count - 2] = lines[lines.Count - 2].Replace("=\"$2\"", string.Format("=\"{0}\"", strMaxRecLength)); 
    int newLen = lines[lines.Count - 2].Length; 

    // Remove spaces from reserved whitespace 
    lines[lines.Count - 1] = lines[lines.Count - 1].Replace(":" + reservedSpace, ":" + new string('#', reservedSpace.Length - newLen + oldLen)); 

    // We move back to just after the UTF8/UTF16 preamble 
    fs.Position = sw.Encoding.GetPreamble().Length; 

    // And we rewrite the lines 
    foreach (string str in lines) 
    { 
     sw.Write(str); 
     sw.Write("\r\n"); 
    } 
} 

較慢的.NET 3.5的方式

在.NET 3.5中StreamReader/StreamWriter要關閉的基地FileStream,所以我必須要重新打開不同時期的文件。這有點慢一點。

int maxRecCountLength = 10; // int.MaxValue.ToString().Length 
int maxRecLengthLength = 10; // int.MaxValue.ToString().Length 
int tokenLength = 4; // 4 == $1 + $2, see below what $1 and $2 are 
         // Note that the reserved space will be in the form +++++++++++++++++++ 

string reservedSpace = new string('+', maxRecCountLength + maxRecLengthLength - tokenLength); 
string fileName = "out.xml"; 

int recCount = 0; 
int maxRecLength = 0; 

using (var sw = new StreamWriter(fileName)) 
{ 
    // Here you write on your StreamWriter however you want. 
    // Note that recCount and recLength have a placeholder $1 and $2. 
    using (var xw = XmlWriter.Create(sw)) 
    { 
     xw.WriteWhitespace("\r\n"); 
     xw.WriteStartElement("Table"); 
     xw.WriteAttributeString("recCount", "$1"); 
     xw.WriteAttributeString("recLength", "$2"); 

     // You have to add some white space that will be 
     // partially replaced by the recCount and recLength value 
     xw.WriteWhitespace("\r\n"); 
     xw.WriteComment("Reserved space:" + reservedSpace); 

     // <--------- BEGIN YOUR CODE 
     for (int i = 0; i < 100; i++) 
     { 
      xw.WriteWhitespace("\r\n"); 
      xw.WriteStartElement("Rec"); 

      string str = string.Format("Some number: {0}", i); 
      if (str.Length > maxRecLength) 
      { 
       maxRecLength = str.Length; 
      } 
      xw.WriteValue(str); 

      recCount++; 

      xw.WriteEndElement(); 
     } 
     // <--------- END YOUR CODE 

     xw.WriteWhitespace("\r\n"); 
     xw.WriteEndElement(); 
    } 
} 

var lines = new List<string>(); 

using (var sr = new StreamReader(fileName)) 
{ 
    // Now we read the first lines to modify them (normally we will 
    // read three lines, the xml header, the <Table element and the 
    // <-- Reserved space: 

    while (true) 
    { 
     string str = sr.ReadLine(); 
     lines.Add(str); 

     if (str.StartsWith("<Table")) 
     { 
      // We read the next line, the comment line 
      str = sr.ReadLine(); 
      lines.Add(str); 
      break; 
     } 
    } 
} 

// We have to use the Stream overload of StreamWriter because 
// we want to modify the text! 
using (var fs = File.OpenWrite(fileName)) 
using (var sw = new StreamWriter(fs)) 
{ 
    string strCount = XmlConvert.ToString(recCount); 
    string strMaxRecLength = XmlConvert.ToString(maxRecLength); 

    // We do some replaces for the tokens 
    int oldLen = lines[lines.Count - 2].Length; 
    lines[lines.Count - 2] = lines[lines.Count - 2].Replace("=\"$1\"", string.Format("=\"{0}\"", strCount)); 
    lines[lines.Count - 2] = lines[lines.Count - 2].Replace("=\"$2\"", string.Format("=\"{0}\"", strMaxRecLength)); 
    int newLen = lines[lines.Count - 2].Length; 

    // Remove spaces from reserved whitespace 
    lines[lines.Count - 1] = lines[lines.Count - 1].Replace(":" + reservedSpace, ":" + new string('#', reservedSpace.Length - newLen + oldLen)); 

    // We move back to just after the UTF8/UTF16 preamble 
    sw.BaseStream.Position = sw.Encoding.GetPreamble().Length; 

    // And we rewrite the lines 
    foreach (string str in lines) 
    { 
     sw.Write(str); 
     sw.Write("\r\n"); 
    } 
} 
+0

不知何故,當我嘗試設置fs時,我開始使用StreamReader後出現問題。獲取異常的位置**不能訪問已關閉的文件**。我試圖把一個額外的參數放入一個FileStream ctor(有關其他流對這個文件的訪問權限),但它沒有幫助。另外,FileStream,StreamWriter和StreamReader的ctors存在問題 - 參數出現問題。我想在將來的C#和.Net版本中可能會有一些變化,(我必須使用3和3.5)。但無論如何,我喜歡你的見解,而且效果很好。非常感謝! –

+1

@PayshaYugen通常情況下,StreamReader會在處理基本流時自動關閉它。你必須阻止它做到這一點。我使用的參數(最後一個「確切」是確切的)是出於這個原因。 – xanatos

+0

感謝您提供此信息!我將在稍後再次檢查,Stream構造函數中的參數有什麼問題。我通過在一開始創建'lines' List來擺脫'StreamReader',並在將數據傳遞到文件時將所需信息保存到其中。所以之後我不必閱讀。 –

0

你可以嘗試將XML文件加載到數據集,因爲它會更容易計算你的屬性的方式。此外,內存管理由DataSet層完成。爲什麼不試一試,讓我們都知道結果。

1

嘗試使用以下方法。

您可以將默認值設置爲外部xml架構中的屬性。

創建xml文檔時,不要創建這些屬性。那就是:

int count = 5; 
int length = 42; 

var writerSettings = new XmlWriterSettings { Indent = true }; 
using (var writer = XmlWriter.Create("data.xml", writerSettings)) 
{ 
    writer.WriteStartElement("Table"); 

    for (int i = 1; i <= count; i++) 
    { 
     writer.WriteStartElement("Rec"); 
     writer.WriteAttributeString("recId", i.ToString()); 
     writer.WriteString(".."); 
     writer.WriteEndElement(); 
    } 
} 

因此,XML是這樣的:

<?xml version="1.0" encoding="utf-8"?> 
<Table> 
    <Rec recId="1">..</Rec> 
    <Rec recId="2">..</Rec> 
    <Rec recId="3">..</Rec> 
    <Rec recId="4">..</Rec> 
    <Rec recId="5">..</Rec> 
</Table> 

現在對於這個文件,這將指定默認值所需的屬性創建一個XML架構。

string ns = "http://www.w3.org/2001/XMLSchema"; 
using (var writer = XmlWriter.Create("data.xsd", writerSettings)) 
{ 
    writer.WriteStartElement("xs", "schema", ns); 

    writer.WriteStartElement("xs", "element", ns); 
    writer.WriteAttributeString("name", "Table"); 

    writer.WriteStartElement("xs", "complexType", ns); 
    writer.WriteStartElement("xs", "sequence", ns); 

    writer.WriteStartElement("xs", "any", ns); 
    writer.WriteAttributeString("processContents", "skip"); 
    writer.WriteAttributeString("maxOccurs", "unbounded"); 
    writer.WriteEndElement(); 

    writer.WriteEndElement(); 

    writer.WriteStartElement("xs", "attribute", ns); 
    writer.WriteAttributeString("name", "recCount"); 
    writer.WriteAttributeString("default", count.ToString()); // <-- 
    writer.WriteEndElement(); 

    writer.WriteStartElement("xs", "attribute", ns); 
    writer.WriteAttributeString("name", "recLength"); 
    writer.WriteAttributeString("default", length.ToString()); // <-- 
    writer.WriteEndElement(); 
} 

或者更容易地創建一個模式如下:

XNamespace xs = "http://www.w3.org/2001/XMLSchema"; 

var schema = new XElement(xs + "schema", 
    new XElement(xs + "element", new XAttribute("name", "Table"), 
     new XElement(xs + "complexType", 
      new XElement(xs + "sequence", 
       new XElement(xs + "any", 
        new XAttribute("processContents", "skip"), 
        new XAttribute("maxOccurs", "unbounded") 
       ) 
      ), 
      new XElement(xs + "attribute", 
       new XAttribute("name", "recCount"), 
       new XAttribute("default", count) // <-- 
      ), 
      new XElement(xs + "attribute", 
       new XAttribute("name", "recLength"), 
       new XAttribute("default", length) // <-- 
      ) 
     ) 
    ) 
); 

schema.Save("data.xsd"); 

請注意變量countlength的寫作 - 應該有你的數據。

生成的模式將是這樣的:

<?xml version="1.0" encoding="utf-8"?> 
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"> 
    <xs:element name="Table"> 
    <xs:complexType> 
     <xs:sequence> 
     <xs:any processContents="skip" maxOccurs="unbounded" /> 
     </xs:sequence> 
     <xs:attribute name="recCount" default="5" /> 
     <xs:attribute name="recLength" default="42" /> 
    </xs:complexType> 
    </xs:element> 
</xs:schema> 

現在,讀取XML文檔時,你一定要添加這個模式 - 默認屬性值會從中取。

XElement xml; 

var readerSettings = new XmlReaderSettings(); 
readerSettings.ValidationType = ValidationType.Schema; // <-- 
readerSettings.Schemas.Add("", "data.xsd"); // <-- 

using (var reader = XmlReader.Create("data.xml", readerSettings)) // <-- 
{ 
    xml = XElement.Load(reader); 
} 
xml.Save(Console.Out); 
Console.WriteLine(); 

結果:

<Table recCount="5" recLength="42"> 
    <Rec recId="1">..</Rec> 
    <Rec recId="2">..</Rec> 
    <Rec recId="3">..</Rec> 
    <Rec recId="4">..</Rec> 
    <Rec recId="5">..</Rec> 
</Table> 
+0

我真的很感謝你的幫助。在某些情況下,您的方法可能非常有用,所以我會記住它。但不幸的是,就我而言,我只需要創建該文件並將其發送給我的客戶。他們在旁邊閱讀,所以我無法改變他們的閱讀方式。 –

相關問題