XPath如何獲取子節點文本和自我

我想要一個XPath獲取包含在特定節點和子節點中的所有文本。XPath如何獲取子節點文本和自我

在下面的例子中，我試圖讓：「尼爾·卡邁克爾（斯特勞德）（CON）：」

<p> 
<a class="anchor" name="qn_o0"> </a> 
<a class="anchor" name="160210-0001.htm_wqn0"> </a> 
<a class="anchor" name="160210109000034"> </a> 
1. <a class="anchor" name="160210109000555"> </a> 
    <b><b>Neil Carmichael</b> 
    "(Stroud) (Con):" 
    </b> 
    "What assessment he has made of the value to the economy in Scotland of UK membership of the single market. [903484]" 
</p>

到目前爲止，我已經設法搞到只是其中的一部分，或者使用下面的代碼另：

from lxml import html 
import requests 
page = requests.get('http://www.publications.parliament.uk/pa/cm201516/cmhansrd/cm160210/debtext/160210-0001.htm') 
tree = html.fromstring(page.content) 

test2 = tree.xpath('//div[@id="content-small"]/p[(a[@name[starts-with(.,"st_o")]] or a[@name[starts-with(.,"qn_")]])]/b/text()')

任何幫助歡迎！

來源

2016-02-21 Fred_B

將您的XPath停止在/b，因此它會返回<b>而不是<b>中的文本節點。然後，你可以調用text_content()每個元素，以獲得預期的文本輸出，例如：

from lxml import html 

raw = '''<p> 
<a class="anchor" name="qn_o0"> </a> 
<a class="anchor" name="160210-0001.htm_wqn0"> </a> 
<a class="anchor" name="160210109000034"> </a> 
1. <a class="anchor" name="160210109000555"> </a> 
    <b><b>Neil Carmichael</b> 
    "(Stroud) (Con):" 
    </b> 
    "What assessment he has made of the value to the economy in Scotland of UK membership of the single market. [903484]" 
</p>''' 

root = html.fromstring(raw) 
result = root.xpath('//p/b') 
print result[0].text_content() 

# output : 
# 'Neil Carmichael\n  "(Stroud) (Con):"\n '

作爲替代text_content()，您可以使用XPath string()功能和可選normalize-space()：

print result[0].xpath('string(normalize-space())') 
# output : 
# Neil Carmichael "(Stroud) (Con):"

來源

2016-02-21 13:48:15 har07

謝謝，我試過類似的東西，但訣竅是在[0]時間訪問一個元素。 –

XPath如何獲取子節點文本和自我

回答

相關問題