2012-07-23 103 views
0

我想解析包含站點地圖列表的站點地圖索引。解析.gz站點地圖

我成功解析了sitemapindex.xml並獲得了.gz鏈接列表;但我想知道什麼是將它們打開爲xml的最佳方式?

 String sitemap = "http://www.site.com/siteindex.xml"; 
     XmlDocument xml = new XmlDocument(); 
     xml.Load(sitemap); 
     XmlNamespaceManager manager = new XmlNamespaceManager(xml.NameTable); 
     manager.AddNamespace("s", xml.DocumentElement.NamespaceURI); //Using xml's properties instead of hard-coded URI 
     XmlNodeList xnList = xml.SelectNodes("/s:sitemapindex/s:sitemap", manager); 

     var parallelLoop1 = xnList.Count; 
     Parallel.For(0, parallelLoop1, parOptions, index => 
     { 
      String NAME = xnList[index]["loc"].InnerText; 
      System.Net.HttpWebRequest req = (System.Net.HttpWebRequest)System.Net.WebRequest.Create(NAME); 
      req.Timeout = 1000 * 60 * 60; // milliseconds 
      System.Net.WebResponse res = req.GetResponse(); 
      Stream responseStream = res.GetResponseStream(); 
      XmlDocument xml2 = new XmlDocument(); 
      xml2.Load(responseStream); //this is the part where it fails- file is .gz, but xml expected 
      responseStream.Close(); 
    ......... more code 
     } 

回答

0

這是我如何解決它,:

  GZipStream zip = new GZipStream(responseStream, CompressionMode.Decompress); 
      XmlDocument xml2 = new XmlDocument(); 
      xml2.Load(zip); 

,這是我最後的代碼:

String sitemap = "http://www.site.com/siteindex.xml"; 
    XmlDocument xml = new XmlDocument(); 
    xml.Load(sitemap); 
    XmlNamespaceManager manager = new XmlNamespaceManager(xml.NameTable); 
    manager.AddNamespace("s", xml.DocumentElement.NamespaceURI); //Using xml's properties instead of hard-coded URI 
    XmlNodeList xnList = xml.SelectNodes("/s:sitemapindex/s:sitemap", manager); 

    var parallelLoop1 = xnList.Count; 
    Parallel.For(0, parallelLoop1, parOptions, index => 
    { 
     String NAME = xnList[index]["loc"].InnerText; 
     System.Net.HttpWebRequest req = (System.Net.HttpWebRequest)System.Net.WebRequest.Create(NAME); 
     req.Timeout = 1000 * 60 * 60; // milliseconds 
     System.Net.WebResponse res = req.GetResponse(); 
     Stream responseStream = res.GetResponseStream(); 
      GZipStream zip = new GZipStream(responseStream, CompressionMode.Decompress); 
      XmlDocument xml2 = new XmlDocument(); 
      xml2.Load(zip); 
     responseStream.Close(); 
......... more code 
    } 
+0

請描述你所做的更改,以便其他人可以從您的樣品學習。 – 2012-07-25 19:26:07