如何找到文本的父節點？

如果我使用：如何找到文本的父節點？

import requests 
from lxml import html 

response = request.get(url='someurl') 
tree = html.document_fromstring(response.text) 


all_text = tree.xpath('//text()')  # which give all text from page

內部，我們都從頁面中的文本此all_text名單。現在我想知道是否：

text_searched = all_text[all_text.index('any string which is in all_text list')]

是否有可能到達搜索文本的Web元素？

來源

2016-02-20 Mohit Tamta

我認爲BeatuifulSoup是一個更好的選擇。 – skyline75489

您可以使用getparent()方法用於此目的，例如：

..... 
..... 
all_text = tree.xpath('//text()') 

first_text = all_text[0] 
parent_element = first_text.getparent() 

print html.tostring(parent_element)

注意的getparent()might not be the one you expected中時的行爲位於相同的父元素的元素節點後，當前的文本元素。由於lxml實現的樹模型，在這種情況下文本被認爲是前一個元素的tail而不是child，因此getparent()將返回前一個元素。看到下面的例子，以清楚地知道我一直在談論的內容：

from lxml import html 
raw = '''<div> 
    <span>foo</span> 
    bar 
</div>''' 
root = html.fromstring(raw) 
texts = root.xpath('//text()[normalize-space()]') 
print [t for t in texts] 
# output : ['foo', '\n\tbar\n'] 

[html.tostring(e.getparent()) for e in texts] 
# output : ['<span>foo</span>\n\tbar\n', '<span>foo</span>\n\tbar\n'] 
# see that calling getparent() on 'bar' returns the <span> not the <div>

來源

2016-02-20 09:25:07 har07

嗨，解決方案工作。真的非常感謝您的幫助。 –

@MohitTamta歡迎:) – har07

如何找到文本的父節點？

回答

相關問題