ElementTree的文本與標籤

混合想象以下文字：ElementTree的文本與標籤

<description> 
the thing <b>stuff</b> is very important for various reasons, notably <b>other things</b>. 
</description>

我將如何管理與etree接口解析呢？具有description標記，.text屬性只返回第一個單詞 - the。 .getchildren()方法返回<b>元素，但不是文本的其餘部分。

非常感謝！

來源

2015-12-16 Daniel Lovasko

獲取.text_content()。使用lxml.html工作樣本：

from lxml.html import fromstring 

data = """ 
<description> 
the thing <b>stuff</b> is very important for various reasons, notably <b>other things</b>. 
</description> 
""" 

tree = fromstring(data) 

print(tree.xpath("//description")[0].text_content().strip())

打印：

the thing stuff is very important for various reasons, notably other things.

我忘了，雖然指定的一件事，抱歉。我的理想分析版本將包含一個小節列表：[normal（「the thing」），bold（「stuff」），normal（「....」）]，這對lxml.html庫是否可行？

假設你只有文本節點和裏面的描述b元素：

for item in tree.xpath("//description/*|//description/text()"): 
    print([item.strip(), 'normal'] if isinstance(item, basestring) else [item.text, 'bold'])

打印：

['the thing', 'normal'] 
['stuff', 'bold'] 
['is very important for various reasons, notably', 'normal'] 
['other things', 'bold'] 
['.', 'normal']

來源

2015-12-16 18:12:16 alecxe

我忘了，雖然指定的一件事，抱歉。我的理想解析版本將包含一個小節列表：[normal（「the thing」），bold（「stuff」），normal（「....」）]，這可能與lxml.html庫有關嗎？ –

@DanielLovasko肯定，更新。 – alecxe

哇，挺酷的。謝謝！ @alecxe –

ElementTree的文本與標籤

回答

相關問題