2015-06-25 102 views
3

我有一個非常大的XML文件。這是xml格式的簡化版本。XmlReader不斷讀取

<?xml version='1.0' encoding='UTF-8'?> 
<Sender> 
<SenderID>571099948</SenderID> 
<Sponsors> 
    <Sponsor> 
    <SponsorID>TEST01</SponsorID> 
    <Contracts> 
     <Contract> 
     <ContractID>000001</ContractID> 
     <Member> 
      <SSN>1111111111</SSN> 
      <Gender>M</Gender> 
      <Benefits> 
      <Benefit BenefitType="AAA"> 
      </Benefit> 
      <Benefit BenefitType="BBB"> 
      </Benefit> 
      </Benefits> 
     </Member> 
     <Member> 
      <SSN>4444444444</SSN> 
      <Gender>F</Gender> 
      <Benefits> 
      <Benefit BenefitType="AAA"> 
      </Benefit> 
      </Benefits> 
     </Member> 
     </Contract> 
     <Contract> 
     <ContractID>0000002</ContractID> 
     <Member> 
      <SSN>2222222222</SSN> 
      <Gender>F</Gender> 
      <Benefits> 
      <Benefit BenefitType="CCC"> 
      </Benefit> 
      <Benefit BenefitType="DDD"> 
      </Benefit> 
      </Benefits> 
     </Member> 
     </Contract> 
     <Contract> 
     <ContractID>0000003</ContractID> 
     <Member> 
      <SSN>333333333</SSN> 
      <Gender>F</Gender> 
      <Benefits> 
      <Benefit BenefitType="CCC"> 
      </Benefit> 
      </Benefits> 
     </Member> 
     </Contract> 
    </Contracts> 
    </Sponsor> 
    <Sponsor> 
    <SponsorID>TEST02</SponsorID> 
    <Contracts> 
     <Contract> 
     <ContractID>0000011</ContractID> 
     <Member> 
      <SSN>1111111111</SSN> 
      <Gender>M</Gender> 
      <Benefits> 
      </Benefits> 
     </Member> 
     </Contract> 
     <Contract> 
     <ContractID>0000002</ContractID> 
     <Member> 
      <SSN>2222222222</SSN> 
      <Gender>F</Gender> 
      <Benefits> 
      </Benefits> 
     </Member> 
     </Contract> 
    </Contracts> 
    </Sponsor> 
</Sponsors> 
</Sender> 

我想要從父節點獲取合約節點以及SponsorID的所有信息。以下是使用XmlReader部分讀取xml文件的代碼:

 static IEnumerable<XElement> SimpleStreamAxis(string inputUrl, string elementName)  
    { 

      using (XmlReader reader = XmlReader.Create(inputUrl)) 
      { 
       reader.MoveToContent(); 
       while (reader.Read()) 
       { 
        if (reader.NodeType == XmlNodeType.Element) 
        { 
         if (reader.Name == elementName) 
         { 
          XElement el = XNode.ReadFrom(reader) as XElement; 
          if (el != null) 
          { 
           yield return el; 
          } 
         } 
        } 
       } 
      }     
    } 

這是問題所在。我無法使用它,因爲整個贊助商樹可能對記憶太大。

var sponsor = SimpleStreamAxis(file, "Sponsor"); 

我也不能使用這個,因爲我不能告訴SponsorID只有合約節點的信息。

var contract = SimpleStreamAxis(file, "Contract"); 

有沒有我可以在贊助閱讀SponsorID,向前移動光標,並讀取此贊助下的所有合同節點的方式,然後移動到下一個贊助商和閱讀SponsorID及其合同節點等等?

回答

1

嘗試這種情況:

using (XmlReader xmlReader = XmlReader.Create("file.xml")) 
{ 
    while (xmlReader.Read()) 
    { 
     if (xmlReader.ReadToFollowing("SponsorID")) 
     { 
      string sponsorId = xmlReader.ReadElementContentAsString(); 

      // process SponsorID 
      Console.WriteLine(sponsorId); 

      if (xmlReader.ReadToFollowing("Contract")) 
      { 
       do 
       { 
        XmlReader contractSubtree = xmlReader.ReadSubtree(); 
        XElement contractElement = XElement.Load(contractSubtree); 

        // process Contract 
        Console.WriteLine(contractElement.Element("ContractID")); 

       } while (xmlReader.ReadToNextSibling("Contract")); 
      } 
     } 
    } 
} 
1

是的,這可以做到假設SponsorID總是在Contract節點之前。

的基本思想是通過XML文件中讀取,直到你找到想要的名稱"SponsorID""Contract"元素,然後產生他們更高的加工

public static IEnumerable<XElement> StreamNamedElements(XmlReader reader, IEnumerable<XName> names) 
    { 
     var nameSet = new HashSet<XName>(names); 

     while (reader.Read()) 
     { 
      if (reader.NodeType == XmlNodeType.Element && nameSet.Contains(XName.Get(reader.Name, reader.NamespaceURI))) 
      { 
       XElement el = XNode.ReadFrom(reader) as XElement; 
       if (el != null) 
        yield return el; 
      } 
     } 
    } 

SponsorID總是存在的,先Contract箱子,這將正確地列舉出這些元素。但是,如果贊助商ID缺失或出現故障,則可能會收到先前贊助商的贊助商ID。此錯誤可以通過使用ReadSubtree()限制的每個「SponsorID」的範圍,以所述含「Sponsor」元素被截留:

public static IEnumerable<XmlReader> StreamNamedSubtrees(XmlReader reader, IEnumerable<XName> names) 
    { 
     var nameSet = new HashSet<XName>(names); 

     while (reader.Read()) 
     { 
      if (reader.NodeType == XmlNodeType.Element && nameSet.Contains(XName.Get(reader.Name, reader.NamespaceURI))) 
      { 
       var subReader = reader.ReadSubtree(); 
       yield return subReader; 
       ((IDisposable)subReader).Dispose(); // Be sure to advance to the end of the subtree if the caller did not. 
      } 
     } 
    } 

,然後用它喜歡:

 using (var sr = new StringReader(xml)) 
     using (var reader = XmlReader.Create(sr)) 
     { 
      foreach (var subReader in StreamNamedSubtrees(reader, new[] { (XName)"Sponsor" })) 
      { 
       XElement sponsorID = null; 
       foreach (var el in StreamNamedElements(subReader, new[] { (XName)"SponsorID", (XName)"Contract" })) 
       { 
        if (el.Name == "SponsorID") 
        { 
         sponsorID = el; 
        } 
        else if (el.Name == "Contract") 
        { 
         if (sponsorID == null) 
          throw new InvalidOperationException(); 
         // Example "higher processing" 
         Debug.WriteLine(string.Format("{0}: {1}", sponsorID.Value, el.ToString())); 
        } 
       } 
      } 
     } 
+0

謝謝!使用字典保留sponsorID的問題是,當sponsorID更改時,它總是會產生額外的回報,新的sponsorID和舊的Contract。 – seattleSummer

+0

@seattleSummer - 答案已更新,以解決您發現的問題。刪除字典實際上使它更簡單。 – dbc

+0

我沒有看到使用這個循環的重點。 foreach(var SubReader在StreamNamedSubtrees(reader,new [] {(XName)「Sponsor」})) – seattleSummer