僅複製混合xml和HTML中的HTML

我們有一堆html頁面的文件，其中包含額外的xml元素（全部以我們的公司名稱'TLA'作爲前綴）爲我提供的舊程序提供數據和結構現在正在重寫。僅複製混合xml和HTML中的HTML

示例表：

<html > 
<head> 
    <title>Highly Simplified Example Form</title> 
</head> 
<body> 
    <TLA:document xmlns:TLA="http://www.tla.com"> 
     <TLA:contexts> 
      <TLA:context id="id_1" value=""></TLA:context> 
     </TLA:contexts> 
     <TLA:page> 
      <TLA:question id="q_id_1"> 
       <table> 
        <tr> 
         <td> 
          <input id="input_id_1" type="text" /> 
         </td> 
        </tr> 
       </table> 
      </TLA:question> 
     </TLA:page> 
     <!-- Repeat many times --> 
    </TLA:document> 
</body> 
</html>

我的任務是寫一個預處理器，將只複製html元素，完全與他們的屬性和內容到一個新文件。

像這樣：

<html > 
<head> 
    <title>Highly Simplified Example Form</title> 
</head> 
<body> 
    <table> 
     <tr> 
      <td> 
       <input id="input_id_1" type="text" /> 
      </td> 
     </tr> 
    </table> 
    <!-- Repeat many times --> 
</body> 
</html>

我已經採取了使用XSLT因爲這是我需要的是什麼extract the TLA elements爲不同的文件的方法。到目前爲止，這是我的XSLT有：

<?xml version="1.0" encoding="utf-8"?> 
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" 
    xmlns:msxsl="urn:schemas-microsoft-com:xslt" exclude-result-prefixes="msxsl" 
    xmlns:mbl="http://www.mbl.com"> 
    <xsl:output method="xml" indent="yes"/> 
    <xsl:strip-space elements="*" /> 
    <xsl:template match="mbl:* | mbl:*/@* | mbl:*/text()"/> 
    <xsl:template match="@*|node()"> 
    <xsl:copy> 
     <xsl:apply-templates select="@*|node()"/> 
    </xsl:copy> 
    </xsl:template>  
</xsl:stylesheet>

然而，這僅產生如下：

<html > 
<head> 
    <title>Highly Simplified Example Form</title> 
</head> 
<body> 
</body> 
</html>

正如你可以看到TLA內的所有內容：文檔元素被排除在外。在XSLT中需要更改所有html而過濾掉TLA元素？

或者，有沒有更簡單的方法來解決這個問題？我知道幾乎每個瀏覽器都會忽略TLA元素，所以有什麼方法可以使用HTML工具或應用程序獲取我需要的內容？

來源

2013-04-05 Clara Onager

專門針對HTML元素將是困難的，但如果你只是想排除從TLA命名空間中的內容（但仍包括任何非TLA元素的TLA元素包含），那麼這應該工作：

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" 
    xmlns:mbl="http://www.tla.com" exclude-result-prefixes="mbl"> 
    <xsl:output method="xml" indent="yes"/> 
    <xsl:strip-space elements="*" /> 

    <xsl:template match="@*|node()" priority="-2"> 
    <xsl:copy> 
     <xsl:apply-templates select="@*|node()"/> 
    </xsl:copy> 
    </xsl:template> 

    <!-- This element-only identity template prevents the 
     TLA namespace declaration from being copied to the output --> 
    <xsl:template match="*"> 
    <xsl:element name="{name()}"> 
     <xsl:apply-templates select="@* | node()" /> 
    </xsl:element> 
    </xsl:template> 

    <!-- Pass processing on to child elements of TLA elements --> 
    <xsl:template match="mbl:*"> 
    <xsl:apply-templates select="*" /> 
    </xsl:template> 
</xsl:stylesheet>

您還可以，如果你想排除任何有使用這個代替任何非空的命名空間：

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" 
    xmlns:mbl="http://www.tla.com" exclude-result-prefixes="mbl"> 
    <xsl:output method="xml" indent="yes"/> 
    <xsl:strip-space elements="*" /> 

    <xsl:template match="@*|node()" priority="-2"> 
    <xsl:copy> 
     <xsl:apply-templates select="@*|node()"/> 
    </xsl:copy> 
    </xsl:template> 

    <xsl:template match="*"> 
    <xsl:element name="{name()}"> 
     <xsl:apply-templates select="@* | node()" /> 
    </xsl:element> 
    </xsl:template> 

    <xsl:template match="*[namespace-uri()]"> 
    <xsl:apply-templates select="*" /> 
    </xsl:template> 
</xsl:stylesheet>

當要麼是在你的樣品輸入運行，其結果是：

<html> 
    <head> 
    <title>Highly Simplified Example Form</title> 
    </head> 
    <body> 
    <table> 
     <tr> 
     <td> 
      <input id="input_id_1" type="text" /> 
     </td> 
     </tr> 
    </table> 
    </body> 
</html>

來源

2013-04-05 06:40:13 JLRishe

我注意到這個小問題，它不輸出正確的HTML。請注意，自我關閉的輸入元素，這僅適用於xhtml。有沒有辦法獲得有效的HTML出來，因爲當我對真正的文檔（使用xsl：輸出html）運行它，這使得許多標籤未關閉？ – 2013-04-09 08:27:23

我不明白你的評論的第二句話。如果您希望將其輸出爲html，則可以將'xsl：output'方法更改爲「html」。 – JLRishe 2013-04-09 17:54:06

我認爲這可能會更好地做一個完整的問題：http://stackoverflow.com/questions/15897500/closing-tags-when-extracting-html-from-xml – 2013-04-10 07:20:57

僅複製混合xml和HTML中的HTML

回答

相關問題