2013-08-05 30 views
0

我想HTML轉換成JSON .. JSON轉換和使用XSLT這個coneversion自動化和我的輸入文件HTML到XSLT

<!DOCTYPE html> 
<html xml:lang="en" xmlns="http://www.w3.org/1999/xhtml" prefix="dcterms: http://purl.org/dc/terms/ dc: http://purl.org/dc/elements/1.1/ gz: https://www.gazettes.co.uk/metadata"> 
<head> 
<title property="dc:title" about="https://www.gazettes.co.uk/content/5">Managing the probate process on your own</title> 
<meta name="dcterms.format" content="application/xhtml+xml" /> 
<meta name="dcterms.subject" xml:lang="en" content="case study" /> 
<meta name="dcterms.subject" xml:lang="en" content="wills and probate" /> 
<meta name="dcterms.identifier" content="https://www.gazettes.co.uk/content/5" /> 
<meta name="dcterms.relation" content="https://www.gazettes.co.uk/wills-and-probate" /> 
<meta name="gz.position" content="related pane first" /> 
<meta name="gz.weight" content="0" />  
</head> 
<body> 
<article> 
<header> 
<h1 class="title">User profile: Managing the probate process on your own</h1> 
</header> 
<dl>     
<dt>Created date</dt> 
<dd about="https://www.gazettes.co.uk/content/5" property="dcterms:created" content="2013-03-28">28/03/2013</dd> 
<dt>Publication date</dt> 
<dd about="https://www.gazettes.co.uk/content/5" property="dcterms:issued" content="2013-03-28">28/03/2013</dd>     
</dl> 
<section class="abstract-short" about="https://www.gazettes.co.uk/content/7" property="dcterms:abstract"> 
<p>Lisa Rutherford is a shop assistant, whose grandmother died last year. As the personal representative …</p> 
</section> 
<section class="content">     
<p>Lisa Rutherfas keen to keep costs down and if possible not involve any lawyers.</p> 
<p>Before dealas infor, she wishes &#x201C;there was more easily accessible information&#x201D; and that &#x201C;the process was simpler and easier to navigate&#x201D;.</p> 
<p>Lisa describt orderal is an online checklist that could have guided me step by step through the process.&#x201D;</p> 
<p>Inforobate process.</p> 
<p>As peopls of probate.</p> 
<em>The above study pr this semi-fictional account is based on extensive research and user profile analysis.</em> 
<em><a href="mailto:[email protected]">[email protected]</a></em> 
</section> 
</article> 
</body> 
</html> 

和我的XSLT是

<?xml version="1.0" encoding="UTF-8"?> 
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:html="http://www.w3.org/1999/xhtml" xmlns:jn="http://www.json.org" version="2.0" exclude-result-prefixes="html"> 

<xsl:output method="html" encoding="UTF-8" media-type="text/plain" indent="yes"/> 

<!-- strip whitespace from whitespace-only nodes --> 
<xsl:strip-space elements="*"/> 
<xsl:template match="/"> 
<xsl:apply-templates/> 
</xsl:template> 
<!-- handle element nodes --> 
<xsl:template match="html:html"> 
{ 
"title": "<xsl:value-of select="html:head/html:title"/>", 
<xsl:if test="html:head/html:title/@about!=''"> 
     "identifier": "<xsl:value-of select="html:head/html:title/@about"/>", 
</xsl:if> 
<xsl:if test="html:head/html:meta[@content='block' or @content='footer' or @content='page' or @content='news']/@content!=''"> 
"subject": "<xsl:value-of select="html:head/html:meta[@content='block' or @content='footer' or @content='page' or @content='news']/@content"/>", 
</xsl:if> 
"relation": ["<xsl:choose> 
<xsl:when test="substring-after(html:head/html:meta[@name='dcterms.relation']/@content,'https://www.gazettes.co.uk/')"> 
<xsl:value-of select="substring-after(html:head/html:meta[@name='dcterms.relation']/@content,'https://www.gazettes.co.uk')"/> 
</xsl:when> 
<xsl:otherwise> 
<xsl:text>Global</xsl:text></xsl:otherwise></xsl:choose>"], 
"created": "<xsl:value-of select=" 
concat(html:body/html:article/html:dl/html:dd[@property='dcterms:created']/@content,'T23:59:00')"/>", 
"issued": "<xsl:value-of select="concat(html:body/html:article/html:dl/html:dd[@property='dcterms:issued']/@content,'T23:59:00')"/>", 
<xsl:if test="html:head/html:meta[@name='gz.position']"> 
"position": "<xsl:value-of select="html:head/html:meta[@name='gz.position']/@content"/>", 
</xsl:if> 
"weight": "<xsl:value-of select="html:head/html:meta[@name='gz.weight']/@content"/>", 
"source": { 
"uri": "<xsl:value-of select="html:body/html:article/html:dl/html:dd[@property='dc:source']/@content"/>", 
"text": "<xsl:value-of select="html:body/html:article/html:dl/html:dd[@property='dc:source']/text()"/>" 
     }, 
"creator": { 
"uri": "[creator]", 
"text": "[creatorName]" 
     }, 
"rights": "[copyrightattributionURI]", 
<xsl:for-each select="html:body/html:article/html:section"> 
<xsl:if test="./@class!=''"> 
<xsl:text>"</xsl:text><xsl:value-of select="@class"/><xsl:text>: "</xsl:text> 
<xsl:for-each select="./*"> 

<xsl:element name="{local-name()}"><xsl:copy-of select="node()"/></xsl:element> 
</xsl:for-each> 

</xsl:if> 
<xsl:if test="position()!= last()">, </xsl:if> 
</xsl:for-each>  
} 
</xsl:template> 
</xsl:stylesheet> 

和我的輸出是

{ 
"title": "Managing the probate process on your own", 

"identifier": "https://www.gazettes.co.uk/content/5", 

"relation": ["/wills-and-probate"], 
"created": "2013-03-28T23:59:00", 
"issued": "2013-03-28T23:59:00", 

"position": "related pane first", 

"weight": "0", 
"source": { 
"uri": "", 
"text": "" 
}, 
"creator": { 
"uri": "[creator]", 
"text": "[creatorName]" 
}, 
"rights": "[copyrightattributionURI]", 
"abstract-short: "<p>Lisa Rutherford is a shop assistant, whose grandmother died last year. As the personal representative …</p>, "content: " 
<p>Lisa Rutherford is a shop assistant, whose grandmother died last year. As the personal representative of her grandmother’s 
    will, Lisa was responsible for dealing with her grandmother’s affairs. At the time, Lisa’s family was on a tight budget and 
    she was keen to keep costs down and if possible not involve any lawyers. 
</p> 
<p>Before dealing with her grandmother’s will, Lisa had only an introductory knowledge of probate and was informed by friends 
    and colleagues on what needed to be done. Reflecting on her experience, she wishes 「there was more easily accessible information」 
    and that 「the process was simpler and easier to navigate」. 
</p> 
<p>Lisa describes how it was difficult for her to understand what forms she needed to sign and in what order she should have 
    proceeded with probate. She concludes, 「what would have been ideal is an online checklist that could have guided me step by 
    step through the process.」 
</p> 
<p>Informed of what would have made Lisa’s experience easier, The Gazette has included in its new Wills and Probate service an 
    online probate checklist that will guide each user step by step through the probate process. 
</p> 
<p>As people like Lisa increasingly choose to undertake the probate process themselves, there is a growing need for an easily 
    accessible and easy to navigate online probate checklist. In order to meet this need, The Gazettes new Wills and Probate service 
    includes a comprehensive online checklist that guides each person step by step through the legalities and practicalities of 
    probate. 
</p><em>The above study profiles personal and professional needs in relation to The Gazettes services. While the identity of the individual 
    and the context of his or her circumstance have been altered, this semi-fictional account is based on extensive research and 
    user profile analysis.</em><em><a xmlns="http://www.w3.org/1999/xhtml" href="mailto:[email protected]">[email protected]</a></em>  
} 

這裏我的問題,<a xmlns="http://www.w3.org/1999/xhtml" href="mailto:[email protected]">[email protected]</a>有我想要刪除的默認名稱空間。即它應該是<a href="mailto:[email protected]">[email protected]</a>

任何人都可以幫我嗎?

回答

1

如果要更改元素的名稱空間(從XHTML名稱空間到沒有名稱空間),則不能使用xsl:copy-of,它會以原樣複製它們;您需要使用遞歸模板來重構每個元素:

<xsl:template match="*" mode="no-ns"> 
    <xsl:element name="{local-name()}"> 
    <xsl:copy-of select="@*"/> 
    <xsl:apply-templates mode="no-ns"/> 
    </xsl:element> 
</xsl:template> 
+0

請問您能否用一些實時示例來解釋這一點?請。 – Sakthivel

+0

如果你不理解它,那麼你需要做一些閱讀。給自己一個好的XSLT教科書。或者至少告訴我們,你不明白哪一部分。 –