我有一個項目將填充每個解析函數。我想在解析完成後返回更新的項目。這是我的情景:修改多個解析函數中的項目並返回更新的項目?
我的項目類:
class MyItem(Item):
name = Field()
links1 = Field()
links2 = Field()
我有多個網址,登錄後抓取:
在解析功能,我這樣做:
for url in urls:
yield Request(url=url, callback=self.get_info)
在get_info,我將在每個響應中提取「名稱」和「鏈接」:
item = MyItem()
item['name'] = hxs.select("//title/text()").extract()
links = []
link = {}
for data in json_parsed_from_response:
link['name'] = data.get('name')
link['url'] = data.get('url')
links.append(link)
item['links1] = links
#similarly, item['links2'] is created.
現在,我想通過每個網址的每個項目[「links1]和項目[」 links2' ]作爲(這些循環是內部的get_info):
for link in item['links1']:
request = Request(url= link['url'], callback=self.get_status)
request.meta['link'] = link
yield request
for link in item['links2']:
request = Request(url= link['url'], callback=self.get_status)
request.meta['link'] = link
yield request
# Where do I return item, can't return item inside generator
def get_status(self, response):
link = response.meta['link']
if "good" in response.body:
link['status'] = 'good'
else:
link['status'] = 'bad'
# Changes made here, will be reflected in item?
# Also, I can't return item from here. Multiple items will be returned.
我不能找出item
必須返回的位置,它應該包含所有更新的數據。
你在哪裏創建「項目」本身?你在哪裏創建「MyItem」?你能顯示整個代碼嗎? –
它在get_info中。對不起,不能發佈代碼,其大。我已經發布確切的情況。 – rajpy