我有一個看起來像這樣的HTML元素:如何分組XPath?
我想組h1
,div.article-meta
和div.article-content
,這樣我就可以寫循環一行的數據線在我的Scrapy項目。
我正在考慮將它們中的每一個分組到一個var,然後循環該var,我不知道如何去做。
請建議。謝謝,
到目前爲止,我已經試過這樣:
def parse(self, response):
now = time.strftime('%Y-%m-%d %H:%M:%S')
hxs = scrapy.Selector(response)
titles = hxs.xpath('//div[@class="list-article"]/h1')
images = hxs.xpath('//div[@class="list-article"]/feature-image')
contents = hxs.xpath('//div[@class="list-article"]/article-content')
for i, title in titles:
item = DapnewsItem()
item['categoryId'] = '1'
name = titles[i].xpath('a/text()')
if not name:
print('DAP => [' + now + '] No title')
else:
item['name'] = name.extract()[0]
description = contents[i].xpath('p/text()')
if not description:
print('DAP => [' + now + '] No description')
else:
item['description'] = description[1].extract()
url = titles[i].xpath("a/@href")
if not url:
print('DAP => [' + now + '] No url')
else:
item['url'] = url.extract()[0]
imageUrl = images[i].xpath('img/@src')
if not imageUrl:
print('DAP => [' + now + '] No imageUrl')
else:
item['imageUrl'] = imageUrl.extract()[0]
yield item
這是我得到的錯誤。
您好,我已經更新了我的SOFAR – Vicheanak