在一個頁面上,我要爬兩個環節,並進入各個環節抓取一些信息,然後收集他們的一個項目,我的代碼是:scrapy:在一個頁面中多領域和調用不同的回調
def parse(self, response):
a = '/html/body/div[3]/div/div/div[3]/ul/li[position()>1]/ul/li/a/'
# function query returns HtmlXPathSelector(response).select(xpath).extract()
song_names = query(a + 'text()', response)
song_links = query(a + '@href', response)
for name, link in izip(song_names, song_links):
yield Request(
url=self.host + link,
meta={'item': BdmmsItem(singer=name)},
callback=self.parse_single_song)
def parse_single_song(self, response):
item = response.meta['item']
album_link = query('a[contains(@href, "/album/")]/@href', response)[0]
lrc_link = query('//a[@lyricdata]/@lyricdata', response)[0]
# here, i want to go into the two different page to get different information
if lrc_link:
yield Request(
url=lrc_link[0],
meta={'item': item},
callback=self.parse_lrc)
if album_link:
yield Request(
url=album_link[0],
meta={'item': item},
callback=self.parse_album)
# if use urllib2, but how do that in scrapy
'''
item['lrc'] = urllib2.urlopen(lrc_link).read()
item['album'] = some_other_func(urllib2.urlopen(album_link).read())
'''
def parse_lrc(self, response):
item = response.meta['item']
item['lrc'] = response.body
yield item
def parse_album(self, response):
item = response.meta['item']
item['album'] = query('div[@id="album-info"]', response)
yield item
它會生成兩個項目。我如何做到這一點,使一個項目中產生的信息?
您是否試過[docs]中的示例(http://doc.scrapy.org/en/latest/topics/request-response.html#passing-additional-data-to-callback-functions)? – alecxe 2013-04-26 08:15:41
是的,我有。但那不是我想要的。我必須在**不同的頁面**中處理**兩個不同的回調**,請參閱上面的代碼。這個例子首先抓取page1,然後調用parse_page2,但在我的情況下,這兩個不是順序的,也不需要傳遞參數。 – user2322187 2013-04-26 10:35:09