有沒有一種方法來禁用LXML

我使用LXML 2.2.8，並試圖一些現有的HTML文件轉換爲Django模板錨屬性的URL編碼。是我遇到的唯一問題是，LXML urlencodes錨點名稱和HREF屬性。例如：有沒有一種方法來禁用LXML

<xsl:template match="a"> 
<!-- anchor attribute href is urlencoded but the title is escaped --> 
<a href="{{{{item.get_absolute_url}}}}" title="{{{{item.title}}}}"> 
    <!-- name tag is urlencoded --> 
    <xsl:attribute name="name">{{item.name}}</xsl:attribute> 
    <!-- but other attributes are not --> 
    <xsl:attribute name="nid">{{item.nid}}</xsl:attribute> 
    <xsl:attribute name="class">{{item.class_one}}</xsl:attribute> 
    <xsl:apply-templates/> 
</a>

生成HTML這樣的：

<a href="%7B%7Bitem.get_absolute_url%7D%7D" 
    title="{{item.title}}" name="%7B%7Bitem.name%7D%7D" 
    nid="{{item.nid}}" class="{{item.class_one}}">more info</a>

我想做的是這樣的：

<a href="{{item.get_absolute_url}}">more info</a>

是有辦法禁用（自動）urlencoding，lxml正在做什麼？

這裏是（基本上）我使用生成和解析該文件的代碼：

from lxml import etree, html 
from StringIO import StringIO 

doc = StringIO(
'''<html> 
<head> 
    <title>An experiment</title> 
</head> 
<body> 
<p class="one">This is an interesting paragraph detailing the inner workings of something</p> 
<p class="two">paragraph with <a href="/link/to/more">more info</a></p> 
<p>posted by: me</p> 
</body> 
</html>''') 

stylesheet = StringIO(
'''<?xml version="1.0" encoding="UTF-8"?> 
<xsl:stylesheet version="1.0" 
xmlns:xhtml="http://www.w3.org/1999/xhtml" 
xmlns:xsl="http://www.w3.org/1999/XSL/Transform" 
exclude-result-prefixes="xhtml xsl"> 
<xsl:template match="p[@class='one']"> 
    <xsl:copy> 
     <!-- when adding an attribute with the xsl:attribute tag --> 
     <!-- the curly braces are not escaped, ie you dont have --> 
     <!-- to double them up --> 
     <xsl:attribute name="class">{{item.class_one}}</xsl:attribute> 
     <xsl:attribute name="nid">{{item.nid}}</xsl:attribute> 
     <xsl:apply-templates/> 
    </xsl:copy> 
</xsl:template> 

<xsl:template match="p[@class='two']"> 
    <!-- but double 'em up in this instance --> 
    <p class="{{{{item.class_two}}}}"> 
     <xsl:apply-templates/> 
    </p> 
</xsl:template> 

<xsl:template match="a"> 
    <!-- anchor attribute href is urlencoded but the title is escaped --> 
    <a href="{{{{item.get_absolute_url}}}}" title="{{{{item.title}}}}"> 
     <!-- name tag is urlencoded --> 
     <xsl:attribute name="name">{{item.name}}</xsl:attribute> 
     <!-- but oher attributes are not --> 
     <xsl:attribute name="nid">{{item.nid}}</xsl:attribute> 
     <xsl:attribute name="class">{{item.class_one}}</xsl:attribute> 
     <xsl:apply-templates/> 
    </a> 
</xsl:template> 

<xsl:template match="@*|node()"> 
    <xsl:copy> 
    <xsl:apply-templates /> 
    </xsl:copy> 
</xsl:template> 
</xsl:stylesheet> 
''') 
def parse_doc(): 
    xsl = etree.parse(stylesheet) 
    trans = etree.XSLT(xsl) 
    root = html.parse(doc, etree.HTMLParser(encoding="windows-1252")) 
    transformed = trans(root) 
    print html.tostring(transformed) 

if __name__ == '__main__': 
    parse_doc()

不同之處在於，這些文件都是畸形的HTML :)

來源

2011-01-13 AnvilRockRoad

如果您在XSLT程序中聲明瞭``，這仍然會發生嗎？ – Tomalak 2011-01-13 20:25:36

只有Altova可以用`html`方法序列化來重現這一點。 MSXSL 3/4和Saxon沒有。 – 2011-01-13 20:32:38

也許你可以使用XML而不是HTML序列化程序。

>>> from lxml import etree, html 
>>> 
>>> t = etree.XML('<a href="{{x}}" />') 
>>> 
>>> etree.tostring(t) 
'<a href="{{x}}"/>' 
>>> html.tostring(t) 
'<a href="%7B%7Bx%7D%7D"></a>'

來源

2011-01-13 20:25:52 jensq