枕頭+ scrapy =有時無法識別圖像文件

我有scrapy和枕頭的小bug。知道他們有很多「相同」的問題，但我嘗試所有我發現，這是行不通的。枕頭+ scrapy =有時無法識別圖像文件

我使用scrapy解析許多網站，超過100 000個網頁。我創建了一個管道，用於定義頁面是否包含圖像，如果它下載圖片並在同一路徑上創建縮略圖。使用它，因爲如果創建縮略圖失敗，我有「大」版本的圖像。

這裏是一些代碼

from PIL import Image 
from slugify import slugify 

class DownloadImageOnDisk(object): 
    def process_item(self, item, spider): 
     try: 
      # If image on page 
      if item[ 'image' ]: 
       img  = item[ 'image' ] 
       # Get extension of image 
       ext  = img.split('.') 
       ext  = ext[ -1 ].split('?') 
       ext  = ext[0] 
       key  = self.remove_accents(item[ 'imagetitle' ]).encode('utf-8', 'replace') 
       path = settings[ 'IMG_PATH' ] + item[ 'website' ] + '/' + key + '.' + ext 

       # Create dir 
       if not os.path.exists(settings['IMG_PATH'] + item['website']): 
        os.makedirs(settings[ 'IMG_PATH' ] + item[ 'website' ]) 

       # Check if image not already exist 
       if not os.path.isfile(path): 
        # Download big image 
        urllib.urlretrieve(img, path) 
        if os.path.isfile(path): 
         # Create thumb 
         self.optimize_image(path) 

       item[ 'image' ] = item[ 'website' ] + '/' + key + '.' + ext 

      return item 
     except Exception as exc: 
      pass 

    # Slugify path 
    def remove_accents(self, input_str): 
     try: 
      return slugify(input_str) 
     except Exception as exc: 
      raise DropItem(exc) 

    # Create thumb 
    def optimize_image(self, path): 
     try: 
      image = Image.open(path) 
      image.thumbnail((200,200), Image.ANTIALIAS) 
      image.save(path, optimize=True, quality=85) 
     except IOError as exc: 
      raise DropItem(exc) 
     except Exception as exc: 
      raise DropItem(exc)

但有時，不是regulary（一個用於100個項目我認爲）我有這個錯誤

cannot identify image file '/PATH/NAME.jpg'

在optimize_image功能。當我檢查磁盤上的圖像時，它已經存在。

我真的不明白..

我你有什麼建議。

在此先感謝

來源

2015-05-08 magexcustomer

你看到這一個？ http://stackoverflow.com/questions/19230991/image-open-cannot-identify-image-file-python – selllikesybok

是的，已經有'從PIL導入圖像'，我已經解碼器爲PIL（JPEG，PNG，GIF等）。我也用'io.BytesIO（fd.read（））'測試過，但已經不行了..當我做'pip freeze | grep -E'（Pillow | PIL）''在控制檯上，我只有_Pillow == 2.8.1_ – magexcustomer

不知道，但它似乎與

import requests 
import io 
... 
response = requests.get(img) 
image = Image.open(io.BytesIO(response.content)) 
image.thumbnail((200,200), Image.ANTIALIAS) 
image.save(path, optimize=True, quality=85)

來解決我繼續我的測試

來源

2015-05-08 02:07:52 magexcustomer

您確認這是解決方案嗎？我正面臨同樣的問題 – jaime

是 - Image.open（io.BytesIO（response.content）） - [而不是urllib.urlretrieve（img，path）]解決了我的問題 – magexcustomer

枕頭+ scrapy =有時無法識別圖像文件

回答

相關問題