如何在Python中使用Scrapy刮取網址

我想使用python中的scrapy從鏈接「http://presskr.com/category/Mobiles--Tablets/35」中提取整個產品網址。下面是我使用完成該轉換的功能：如何在Python中使用Scrapy刮取網址

def parse(self, response): 
    print("hello"); 

    hxs = HtmlXPathSelector(response) 
    sites = hxs.select('//div[@id="pagination_contents"]') 
    items = [] 
    i=3 
    for site in sites: 
     item = DmozItem() 
     item['link'] = site.select('div[2]/div['+str(i)+']/a/@href').extract() 
     i=int(i)+1; 
     print i 
     items.append(item) 
    return items

每個產品div的x路是：// DIV [@ ID = 「pagination_contents」]/DIV [2] /格['+ str（i）+']/a/@ href

但是我只收到一個鏈接，而不是所有產品的網址。

來源

2016-01-31 Sunny Mishra

請嘗試以下操作。我建議遵循Scrapy指南，只需按照相應步驟操作即可，不需要太多手動操作。你的例子非常像：http://doc.scrapy.org/en/latest/intro/tutorial.html#extracting-the-data，所以只需按照這一步

def parse(self, response): 
     for href in response.xpath('//span[@class ="itemlistinginfo"]/a/@href'): 
      full_url = urljoin(href.extract()) 
      item = DmozItem() 
      item['link'] = full_url 
      yield item

來源

2016-01-31 08:51:10 Turo

非常感謝你** Turo **我懂了 –

如何在Python中使用Scrapy刮取網址

回答

相關問題