使用xpath/python選擇特定節點的父節點

如何在此html代碼片段中獲取a的href值？使用xpath/python選擇特定節點的父節點

我需要得到它基於該類在我標記

<!-- 
<a href="https://link.com" target="_blank"><i class="foobar"></i> </a>   
-->

我想這一點，但我沒有得到任何結果

foo_links = tree.xpath('//a[i/@class="foobar"]')

來源

2017-04-13 Brett Webb

您的代碼不會對我的工作 - 它返回一個列表的<a>。如果你想的href不是元素本身的列表，添加/@href：

hrefs = tree.xpath('//a[i/@class="foobar"]/@href')

你也可以先找到<i> S，然後用/parent::*（或簡稱/..）要回的<a>秒。

hrefs = tree.xpath('//a/i[@class="foobar"]/../@href') 
#     ^    ^^ 
#      |     | obtain the 'href' 
#      |     | 
#      |     get the parent of the <i> 
#      | 
#      find all <i class="foobar"> contained in an <a>.

如果所有這些都不起作用，您可能需要驗證文檔的結構是否正確。

請注意，XPath不會在評論內偷看。如果<a>確實在註釋中，則需要首先手動提取文檔。

hrefs = [href for comment in tree.xpath('//comment()') 
       # find all comments 
       for href in lxml.html.fromstring(comment.text) 
       # parse content of comment as a new HTML file 
           .xpath('//a[i/@class="foobar"]/@href') 
           # read those hrefs. 
]

來源

2017-04-13 15:17:44 kennytm

只是好奇爲什麼不只是'/ a/@ href'？ – SomeDude

@svasa OP說：「*我需要得到它的基礎上，我在標記類*」 – kennytm

好吧沒有看到。得到它了。 – SomeDude

你應該注意到，目標元素是HTML評論。你不能簡單地得到<a>從評論與XPath像"//a"像在這種情況下，它不是一個節點，但簡單的字符串。

試試下面的代碼：

import re 

foo_links = tree.xpath('//comment()') # get list of all comments on page 
for link in foo_links: 
    if '<i class="foobar">' in link.text: 
     href = re.search('\w+://\w+.\w+', link.text).group(0) # get href value from required comment 
     break

附：您可能需要使用更復雜的正則表達式來匹配鏈接URL

來源

2017-04-13 15:35:23 Andersson

這似乎是最好的。評論/ <！ - 正在搗亂。我確實增加了一個額外的。\ w +來獲取URL的其餘部分。出於某種原因，我一次只能獲得一條記錄，而每次我運行它時都會得到一條不同的記錄。可能有一些問題附加到我創建的列表中。謝謝 –

刪除了'break'，我得到了我之後的東西 –

使用xpath/python選擇特定節點的父節點

回答

相關問題