如何基於兄弟標籤的價值提取lxml.etree標籤文本

我的目標是從XML文檔（鏈接）拉的網址，並把它們放在一個列表： https://www.valuespreadsheet.com/iedgar/results.php?stock=NFLX&output=xml 如何基於兄弟標籤的價值提取lxml.etree標籤文本

我進口etree從lxml和創建了一個列表理解，從所有<instanceUrl>標籤中提取文本。

url = 'https://valuespreadsheet.com/iedgar/results.php?stock=NFLX&output=xml' 
et = etree.fromstring(urlopen(url).read()) 
return [_.find('instanceUrl').text for _ in et.find('filings')]

現在，我要限制的列表，以便只拉從<instanceUrl>標籤，其中<formType> = 10K的文本。

我該如何做到這一點？

來源

2017-01-18 p_sutherland

另請參閱：http://stackoverflow.com/questions/38845273/can-you-permanently-change-python-code-by-input?noredirect=1&lq=1 – boson

你需要一個XPath expression and the xpath() method：

[url.text for url in et.xpath('//filing[formType = "10-K"]/instanceUrl')]

在這裏，我們正在篩選包含formType子節點與10-K文字filing節點，然後得到instanceUrl孩子。

注意，_變量名稱用於扔掉的變量 - （例如在拆包）必須被定義，但沒有實際使用的變量。在你的情況下，你已經使用過它。

來源

2017-01-18 23:00:37 alecxe

欣賞關於如何解決問題的答案，解釋和建議使用'_'變量名！ –

如何基於兄弟標籤的價值提取lxml.etree標籤文本

回答

相關問題