2014-11-04 188 views
0

我一直在爲此奮鬥了很長一段時間。將特定的HTML結構轉換爲特定的XML結構

我想將html轉換爲xml。結構如下所示。

我正在使用「HtmlAgilityPack」將html轉換爲有效的xml結構。所以,在此之後,我的HTML看起來像這樣:

<div class="menuItem1" video="" preview=""> 
    Menu 1 
    <div class="subMenu1"> 
     <div class="menuItem2" video="" preview=""> 
      Menu 2 
      <div class="subMenu2"> 
       <div class="menuItem3" video="" preview=""> 
        Menu 3 
        <div class="subMenu3"> 
         <div class="" video="" preview="">Menu 4</div> 
        </div> 
        <div class="treeExpand"></div> 
       </div> 
       <div class="menuItem3" video="" preview="">Menu 3</div> 
       <div class="menuItem3" video="" preview="">Menu 3</div> 
      </div> 
      <div class="treeExpand"></div> 
     </div> 
    </div> 
    <div class="treeExpand"></div> 
</div> 
<div class="menuItem1" video="" preview=""> 
    Menu 1 
    <div class="subMenu1"> 
     <div class="menuItem2" video="" preview=""> 
      Menu 2 
      <div class="subMenu2"> 
       <div class="menuItem3" video="" preview=""> 
        Menu 3 
        <div class="subMenu3"> 
         <div class="" video="" preview="">Menu 4</div> 
        </div> 
        <div class="treeExpand"></div> 
       </div> 
       <div class="menuItem3" video="" preview="">Menu 3</div> 
       <div class="menuItem3" video="" preview="">Menu 3</div> 
      </div> 
      <div class="treeExpand"></div> 
     </div> 
    </div> 
    <div class="treeExpand"></div> 
</div> 

這正是我想要的。現在我能得到這個成的XElement,使用該C#代碼:

XDocument doc = XDocument.Parse(THE_HTML_STRING_AS_SHOWN_ABOVE); 
XDocument docw = new XDocument(new XElement("Navigation", doc.Root)); 
XElement root = docw.Root; 

我創建一個方法,該方法我可以通過根成:

GenerateXmlFromHtml(root); 

此方法的代碼:

private string GenerateXmlFromHtml(XElement elem) 
{ 
    StringBuilder sbNavigationXml = new StringBuilder(); 
    try 
    { 
     //HTML will always have a video and preview, according to the generation of the html structure. 

     string text = string.Empty; 
     string videopath = string.Empty; 
     string previewpath = string.Empty; 
     XText textNode; 

     foreach (XElement element in elem.Elements()) 
     { 
      element.Name = "MenuItem"; //Change element name. 

      string htmlClass; 
      try { htmlClass = element.Attribute("class").Value; } 
      catch { htmlClass = ""; } 

      if (!string.IsNullOrEmpty(htmlClass)) 
      { 
       if (htmlClass.Contains("subMenu")) 
       { 
        element.AddBeforeSelf(element.Elements()); 
        element.Remove(); 
        GenerateXmlFromHtml(element); 
       } 
       else if (htmlClass.Contains("menuItem")) 
       { 
        textNode = element.Nodes().OfType<XText>().FirstOrDefault(); 
        text = textNode.Value; 
        videopath = element.Attribute("video").Value; 
        previewpath = element.Attribute("preview").Value; 

        if (element.HasElements) 
        { 
         sbNavigationXml.AppendLine("<MenuItem Text=\"" + text + "\" VideoPath=\"" + videopath + "\" PreviewPath=\"" + previewpath + "\">"); 
         sbNavigationXml.AppendLine(GenerateXmlFromHtml(element)); 
         sbNavigationXml.AppendLine("</MenuItem>"); 
        } 
        else 
        { 
         sbNavigationXml.AppendLine("<MenuItem Text=\"" + text + "\" VideoPath=\"" + videopath + "\" PreviewPath=\"" + previewpath + "\" />"); 
        } 
       } 
       else if (htmlClass.Contains("treeExpand")) 
       { 
        element.AddBeforeSelf(element.Elements()); 
        element.Remove(); 
        GenerateXmlFromHtml(element); 
       } 
      } 
      else 
      { 
       element.AddBeforeSelf(element.Elements()); 
       element.Remove(); 
       GenerateXmlFromHtml(element); 
      } 
     } 
    } 
    catch (Exception) 
    { 
     throw; 
    } 
    return sbNavigationXml.ToString(); 
} 

最後,我想這產生XML輸出:

<Navigation> 
    <MenuItem Text="Menu 1" VideoPath="" PreviewPath=""> 
    <MenuItem Text="Menu 2"> 
     <MenuItem Text="Menu 3"> 
     <MenuItem Text="Menu 4" VideoPath="" PreviewPath="" /> 
     </MenuItem> 
     <MenuItem Text="Menu 3" /> 
     <MenuItem Text="Menu 3" /> 
    </MenuItem> 
    </MenuItem> 
    <MenuItem Text="Menu 1" VideoPath="" PreviewPath=""> 
    <MenuItem Text="Menu 2"> 
     <MenuItem Text="Menu 3"> 
     <MenuItem Text="Menu 4" VideoPath="" PreviewPath="" /> 
     </MenuItem> 
     <MenuItem Text="Menu 3" /> 
     <MenuItem Text="Menu 3" /> 
    </MenuItem> 
    </MenuItem> 
</Navigation> 

換句話說,子菜單應該消失,並且樹擴展div,然後我想生成XML,但目前,我仍然失敗悲慘。請問是否有不清楚的地方。任何幫助讚賞!

============================================== ================================================== ===

編輯: 固定遞歸方法,任何人誰希望看到:

private string GenerateXmlFromHtml(XElement elem) 
{ 
    //HTML will always have a video and preview, according to the generation of the html structure. 
    StringBuilder sbNavigationXml = new StringBuilder(); 
    string text = string.Empty; 
    string videopath = string.Empty; 
    string previewpath = string.Empty; 
    XText textNode; 

    try 
    { 
     foreach (XElement element in elem.Elements()) 
     { 
      //element.Name = "MenuItem"; //Change element name. 
      string htmlClass; 
      try { htmlClass = element.Attribute("class").Value; } 
      catch { htmlClass = ""; } 

      if (!string.IsNullOrEmpty(htmlClass)) 
      { 
       if (htmlClass.Contains("subMenu")) 
       { 
        if (element.HasElements) 
        { 
         sbNavigationXml.AppendLine(GenerateXmlFromHtml(element)); 
        } 
       } 
       else if (htmlClass.Contains("menuItem")) 
       { 
        textNode = element.Nodes().OfType<XText>().FirstOrDefault(); //Get node Text attribute value. 
        text = textNode.Value; 
        videopath = element.Attribute("video").Value; //Get node VideoPath attribute value. 
        previewpath = element.Attribute("preview").Value; //Get node PreviewPath attribute value. 

        if (element.HasElements) 
        { 
         sbNavigationXml.AppendLine("<MenuItem Text=\"" + text + "\" VideoPath=\"" + videopath + "\" PreviewPath=\"" + previewpath + "\">"); 
         sbNavigationXml.AppendLine(GenerateXmlFromHtml(element)); 
         sbNavigationXml.AppendLine("</MenuItem>"); 
        } 
        else 
        { 
         sbNavigationXml.AppendLine("<MenuItem Text=\"" + text + "\" VideoPath=\"" + videopath + "\" PreviewPath=\"" + previewpath + "\" />"); 
        } 
       } 
       else if (htmlClass.Contains("treeExpand")) 
       { 
        //DO NOTHING 
       } 
      } 
      else 
      { 
       if (element.HasElements) 
       { 
        sbNavigationXml.AppendLine(GenerateXmlFromHtml(element)); 
       } 
      } 
     } 
    } 
    catch (Exception) 
    { 
     throw; 
    } 
    return sbNavigationXml.ToString(); 
} 
+0

邊注:通常人們把它搞砸其他方式周圍 - 解析與正則表達式的HTML,但仍構造XML適當的API。有什麼原因需要使用字符串連接來構建XML? – 2014-11-04 15:12:57

+0

@AlexeiLevenkov - 不,我可以做我想做的任何事情......這只是我採用的路徑,但其他任何產生XML輸出的東西都可以,即使我必須做一些完全不同的事情。 – 2014-11-04 15:14:26

+0

查看[如何在C#中構建XML](http://stackoverflow.com/questions/284324/how-can-i-build-xml-in-c)以獲取指導。 – 2014-11-04 15:15:34

回答

1

嘗試在不同的文件分離的輸入和輸出。

然後導航輸入並開始以您想要的格式輸出到您的XmlDocument輸出(另一個變量)。

喜歡的東西...

class Converter 
{ 
    public XmlDocument Convert(XmlDocument inputDocument) 
    { 
     XmlDocument result = new XmlDocument(); 
     ConvertNode(inputDocument.DocumentElement, result.DocumentElement, result); 
     return result; 
    } 

    public void ConvertNode(XmlNode inputNode, XmlNode outputNode, XmlDocument outputDoc) 
    { 
     XmlNode newNode = null; 

     // check elemment class 
     string htmlClass; 
     try { htmlClass = inputNode.Attributes["class"].Value; } 
     catch { htmlClass = ""; } 

     if(!string.IsNullOrWhiteSpace(htmlClass)) 
     { 
      if (htmlClass.Contains("menuItem")) 
      { 
       newNode = outputDoc.CreateElement("MenuItem"); 
       outputNode.AppendChild(newNode); 
      } 

      /// check other wanted nodes etc.. 
     } 

     if (newNode != null) 
     { 
      foreach (XmlNode node in inputNode.ChildNodes) 
      { 
       ConvertNode(node, newNode, outputDoc); 
      } 
     } 
    } 
} 
+0

我正在使用解析器。這就是我如何將html轉換爲有效的xml結構,然後使用XElement將其作爲正常的xml處理。 – 2014-11-04 15:24:07

+0

我不明白這是如何回答這個問題的......你有沒有在帖子中看過代碼? – 2014-11-04 15:30:19

+0

@AlexeiLevenkov我想分離仍然適用。我改變了重要的部分。 – rodrigogq 2014-11-04 15:34:09