如何使用xpath解析嵌套的html標記

使用HtmlXpathSelector我需要解析html文件。

DEF解析（個體，響應）： edxData = HtmlXpathSelector（響應）

第一我需要得到所有包含 edxData.xpath標記（'// H2 [@class =「標題課程標題「]'）
裏面的標籤我需要檢查一個標籤值。
然後需要使用類名稱字幕course-subtitle copy-detail解析div標籤。如何可以解析這個值好心給一些建議

樣本HTML響應數據：

遍歷內標籤

<html> 
<body> 
<h2 class="title course-title"> 
<a href="https://www.edx.org/course/mitx/mitx-14-73x-challenges-global-poverty-1350">The Challenges of Global Poverty 
</a> 
</h2> 
<div class="subtitle course-subtitle copy-detail">A course for those who are interested in the challenge posed by massive and persistent world poverty. 
</div> 
</body> 
</html>

來源

2014-01-30 Nagarajan

一種方式可以是：

>>> for h2 in sel.xpath('//h2[@class = "title course-title"]'): 
...  print h2.xpath('a') 
... 
[<Selector xpath='a' data=u'<a href="https://www.edx.org/course/mitx'>]

甚至根本：

>>> sel.xpath('//h2[@class = "title course-title"]/a') 
[<Selector xpath='//h2[@class = "title course-title"]/a' data=u'<a href="https://www.edx.org/course/mitx'>]

找到另一個XPath的，簡單地做：

>>> sel.xpath('//div[@class="subtitle course-subtitle copy-detail"]') 
[<Selector xpath='//div[@class="subtitle course-subtitle copy-detail"]' data=u'<div class="subtitle course-subtitle cop'>]

它看起來像你使用scrapy，請還標記了這個問題這樣

來源

2014-01-30 15:24:22

如何使用xpath解析嵌套的html標記

回答

相關問題