找到所有h3
標籤,在它們之間迭代,並在迭代循環的每一個步驟,找到一個兄弟標籤p
:
import urllib2
from lxml import etree
URL = "http://www.kb.cert.org/vuls/id/628463"
response = urllib2.urlopen(URL)
parser = etree.HTMLParser()
tree = etree.parse(response, parser)
for header in tree.iter('h3'):
paragraph = header.xpath('(.//following-sibling::p)[1]')
if paragraph:
print "%s: %s" % (header.text, paragraph[0].text)
打印:
Overview: The Ruby on Rails 3.0 and 2.3 JSON parser contain a vulnerability that may result in arbitrary code execution.
Description: Thanks to Lawrence Pit of Mirror42 for discovering the vulnerability.
Impact: Thanks to Lawrence Pit of Mirror42 for discovering the vulnerability.
Solution: Thanks to Lawrence Pit of Mirror42 for discovering the vulnerability.
Vendor Information : Thanks to Lawrence Pit of Mirror42 for discovering the vulnerability.
CVSS Metrics : Thanks to Lawrence Pit of Mirror42 for discovering the vulnerability.
References: Thanks to Lawrence Pit of Mirror42 for discovering the vulnerability.
Credit: Thanks to Lawrence Pit of Mirror42 for discovering the vulnerability.
Feedback: If you have feedback, comments, or additional information about this vulnerability, please send us
Subscribe to Updates: Receive security alerts, tips, and other updates.
感謝您的回覆。 我得到這個錯誤: lxml.etree.XMLSyntaxError:從LXML進口etree 進口的urllib 從StringIO的進口StringIO的 :打開和結束標記不匹配:鏈接,此行代碼1和頭部,1號線,列485 url ='http://www.kb.cert.org/vuls/id/628463' text = urllib.urlopen(url).read() f = StringIO(text) tree = etree.parse(f ) headers = tree.xpath('// h3') for header in header: paragraph = header.xpath('(.// following-sibling :: p)[1]')[0] print 「%s:%s」%(header.text,paragraph.text) p.s.我是新的python和xpath。 – Gomeisa
@Golbarghajian我更新了代碼,請檢查。 – alecxe
它的工作,謝謝sooo多。 – Gomeisa