2017-01-20 81 views
1

我如何下載this驗證碼圖像與PIL或其他圖像處理庫,我嘗試了幾種方法,但我無法下載圖像。下載不帶擴展名的驗證碼圖像

from PIL import Image 
import urllib2 as urllib 
import io 

fd = urllib.urlopen("https://notacarioca.rio.gov.br/senhaweb/CaptchaImage.aspx?guid=9759fc80-d385-480a-aa6e-8e00ef20be7b&s=1") 
image_file = io.BytesIO(fd.read()) 
im = Image.open(image_file) 
print im 

回答

0

您嘗試下載的圖像沒有靜態網址。

環節的工作: Link working 相同的鏈接不再工作: Link not working

這意味着你不能用靜態的URL引用的圖像(urllib.urlopen("https://notacarioca.rio.gov.br/senhaweb/CaptchaImage.aspx?guid=9759fc80-d385-480a-aa6e-8e00ef20be7b&s=1")將無法​​正常工作)。

下面是使用RequestsBeautifulSoup我的解決方案:

import requests 
from mimetypes import guess_extension 
from bs4 import BeautifulSoup 
from urllib.parse import urljoin 
# from PIL import Image 
# from io import BytesIO 

s = requests.session() 
r = s.get("https://notacarioca.rio.gov.br/senhaweb/login.aspx") 

if r.status_code == 200: 
    soup = BeautifulSoup(r.content, "html.parser") 
    div = soup.find("div", attrs={"class": "captcha", "style": "color:Red;width:100%;"}) 

    r = s.get(urljoin("https://notacarioca.rio.gov.br/senhaweb/", div.img["src"])) 
    if r.status_code == 200: 
     guess = guess_extension(r.headers['content-type']) 
     if guess: 
      with open("captcha" + guess, "wb") as f: 
       f.write(r.content) 
      # Image.open(BytesIO(r.content)).show()