我的蜘蛛運行沒有顯示任何錯誤,但圖像沒有保存在文件夾下面是我scrapy文件:Scrapy圖像下載
Spider.py:
import scrapy
import re
import os
import urlparse
from scrapy.spiders import CrawlSpider, Rule
from scrapy.linkextractors import LinkExtractor
from scrapy.loader.processors import Join, MapCompose, TakeFirst
from scrapy.pipelines.images import ImagesPipeline
from production.items import ProductionItem, ListResidentialItem
class productionSpider(scrapy.Spider):
name = "production"
allowed_domains = ["someurl.com"]
start_urls = [
"someurl.com"
]
def parse(self, response):
for sel in response.xpath('//html/body'):
item = ProductionItem()
img_url = sel.xpath('//a[@data-tealium-id="detail_nav_showphotos"]/@href').extract()[0]
yield scrapy.Request(urlparse.urljoin(response.url, img_url),callback=self.parseBasicListingInfo, meta={'item': item})
def parseBasicListingInfo(item, response):
item = response.request.meta['item']
item = ListResidentialItem()
try:
image_urls = map(unicode.strip,response.xpath('//a[@itemprop="contentUrl"]/@data-href').extract())
item['image_urls'] = [ x for x in image_urls]
except IndexError:
item['image_urls'] = ''
return item
settings.py:
from scrapy.settings.default_settings import ITEM_PIPELINES
from scrapy.pipelines.images import ImagesPipeline
BOT_NAME = 'production'
SPIDER_MODULES = ['production.spiders']
NEWSPIDER_MODULE = 'production.spiders'
DEFAULT_ITEM_CLASS = 'production.items'
ROBOTSTXT_OBEY = True
DEPTH_PRIORITY = 1
IMAGE_STORE = '/images'
CONCURRENT_REQUESTS = 250
DOWNLOAD_DELAY = 2
ITEM_PIPELINES = {
'scrapy.contrib.pipeline.images.ImagesPipeline': 300,
}
items.py
# -*- coding: utf-8 -*-
import scrapy
class ProductionItem(scrapy.Item):
img_url = scrapy.Field()
# ScrapingList Residential & Yield Estate for sale
class ListResidentialItem(scrapy.Item):
image_urls = scrapy.Field()
images = scrapy.Field()
pass
我的管道文件是空的我不確定我想要添加到pipeline.py文件。
任何幫助,非常感謝。
謝謝Rafael,但是仍然沒有圖像填充圖像文件夾,我將管道添加到了settings.py文件。改變了存儲路徑並改變了以下幾行image_urls = map(unicode.strip,response.xpath('// a [@ itemprop =「contentUrl」]/@ data-href')。extract()) item ['image_urls '] = [x for image_urls] to item ['image_urls'] = map(unicode.strip,response.xpath('// a [@ itemprop =「contentUrl」]/@ data-href')。提取()) – user1443063
你不能映射的圖像,如果你想保存多個圖像在一個項目中,你必須製作一個數組而不是地圖,這將不會工作 –
我對這一切都很新,我試圖通過改變它來修復它? item ['image_urls'] = response.xpath('// a [@ itemprop =「contentUrl」]/@ data-href')。extract()[0] [0]只能給出一個圖像,但它仍然沒有顯示我是否仍然缺少一些東西,還是仍然是一個數組? – user1443063