如何在python中解析lxml中的iframe？

我發現lxml無法解析iframe的heml元素。如何在python中解析lxml中的iframe？

import lxml.html 
from urllib.request import urlopen 
import os 
url="http://news.163.com/special/mhmingdan/?bdsj" 
file=urlopen(url).read() 
root=lxml.html.document_fromstring(file) 
tab=root.xpath('//iframe')

如何讓lxml獲得iframe的html元素？

來源

2014-03-27 it_is_a_literature

你應該使用正斜槓//而不是反斜槓\\：

tab = root.xpath('//iframe')

此外，您還可以簡化獲取頁和解析，直接傳遞urlopen結果給parse()：

root = lxml.html.parse(urlopen(url))

來源

2014-03-27 03:32:39 alecxe

我已糾正它，但爲什麼我無法獲得iframe？ –

@it_is_a_literature首先，你不應該編輯這個問題。另外，如果你打印出'tab'，你會看到'iframe'元素被找到。 – alecxe

iframe下有一個表節點，爲什麼我不能得到它？ –

-2

page = requests.get(url) 
tree = html.fromstring(page.content) 
src_url = tree.cssselect("iframe") 
print src_url[0].attrib

來源

2017-01-08 16:48:14 fingerlake

Stack Overflow是一個很好的實踐，可以爲您解決方案的工作原理添加一個解釋。有關更多信息，請閱讀[如何回答]（// stackoverflow.com/help/how-to-answer）。 –

如何在python中解析lxml中的iframe？

回答

相關問題