如何使用python Scrapy惰性加載圖像

這裏是我用於爬取網頁的代碼。我想抓取的網站圖片延遲加載啓用，所以scrapy只能抓取100張圖片中的10張，其餘都是placeholder.jpg。在Scrapy中處理延遲加載圖像的最佳方式是什麼？如何使用python Scrapy惰性加載圖像

謝謝！

class MasseffectSpider(scrapy.Spider): 
name = "massEffect" 
allowed_domains = ["amazon.com"] 
start_urls = [ 
    'file://127.0.0.1/home/ec2-user/scrapy/amazon/amazon.html', 
] 


def parse(self, response): 

for item in items: 
    listing = Item() 
    listing['image'] = item.css('div.product img::attr(src)').extract() 
    listing['url'] = item.css('div.item-name a::attr(href)').extract() 
    listings.append(listing)

看來像CasperJS這樣的其他工具有加載圖像的視口。

casper.start('http://m.facebook.com', function() { 

// The pretty HUGE viewport allows for roughly 1200 images. 
// If you need more you can either resize the viewport or scroll down the viewport to load more DOM (probably the best approach). 
this.viewport(2048,4096); 

this.fill('form#login_form', { 
    'email': login_username, 
    'pass': login_password 
}, true); 
});

來源

2016-04-30 Will W

你能分享你正在爬行的網站嗎？在一個pastebin將工作。 – eLRuLL

問題是懶惰的加載是由JavaScript哪些scrapy無法處理，casperjs處理這個。

爲了與scrapy這個工作，你必須將其與硒或scrapyjs

來源

2016-04-30 13:30:14

組合，以在延遲加載刮圖片，你必須跟蹤返回圖像Ajax請求。在此之後，您在scrapy中點擊該請求。從特定頁面獲取所有數據後。您必須通過元數據在scrapy請求中將提取的數據發送到其他回調。爲進一步的幫助Scrapy request

來源

2016-05-02 13:19:41

如何使用python Scrapy惰性加載圖像

回答

相關問題