無法在python中使用scrapy刮掉產品網址

我想使用python中的scrapy從鏈接「http://www.shopclues.com/diwali-mega-mall/hot-electronics-sale-fs/audio-systems-fs.html」中提取所有產品網址。下面是我使用完成該轉換的功能：無法在python中使用scrapy刮掉產品網址

def parse(self, response): 
     print("hello"); 

     hxs = HtmlXPathSelector(response) 
     sites = hxs.select('//div[@id="pagination_contents"]') 
     items = [] 
     i=3 
    for site in sites: 
      item = DmozItem() 
      item['link'] = site.select('div[2]/div['+str(i)+']/a/@href').extract() 
      i=int(i)+1; 
      print i 
      items.append(item) 
    return items

每個產品div的x路是：// DIV [@ ID = 「pagination_contents」]/DIV [2] /格['+ str（i）+']/a/@ href

但是我只收到一個鏈接，而不是所有產品的網址。

來源

2013-10-26 user2747776

我認爲你的問題是hxs.select('//div[@id="pagination_contents"]')只返回一個結果，然後你只在循環中做一個迭代。

您可以選擇包含所有<a>以下<div>元素，以及遍歷這些：

sites = hxs.select('//div[@id="pagination_contents"]/div[2]/div[a]') 
for site in sites: 
    ## This loop will run 33 times in my test. 
    ## Access to each link: 
    item['link'] = site.select('./a[2]/@href').extract()

來源

2013-10-26 14:50:18 Birei

無法在python中使用scrapy刮掉產品網址

回答

相關問題