Python：從html獲取CATCHA文件

我嘗試解碼python上的captcha，但我不知道，我怎麼能從html中獲取它。我用Python：從html獲取CATCHA文件

html = session.get(page, headers=headers).text 
soup = BeautifulSoup(html, "html.parser")

和HTML看起來像

<img src="/captcha.gif" style="width:1px;height:1px"/>

我怎麼能exctract呢？我只能用保存圖像才能做到這一點？

來源

2017-08-09 Petr Petrov

的可能的複製[蟒蛇：從HTML圖像鏈接]（https://stackoverflow.com/questions/5927031/python-get-image -html） –

你可以像在PC上是這樣的：

import urllib.request 
from bs4 import BeautifulSoup as BS 

tag = '<img src="/captcha.gif" style="width:1px;height:1px"/>' 
soup = BS(tag) 
img_tag = soup.find('img') 
urllib.request.urlretrieve('https://absolute/path/to'+img_tag['src'], os.getcwd() + '/temp_img')

來源

2017-08-09 13:08:32

我有一個錯誤'urllib.error.URLError：'。我檢查域是否已經重定向到了print（session.get（page，headers = headers）.url）'，並且它返回給我'https：//moskva.tiu.ru/Tehnicheskie-moyuschie-sredstva; 50'並且得到鏈接到圖像，它看起來像/ captcha.gif'，但'urllib.request.urlretrieve（page + img_link，'/ Users/elenadevyataykina/PycharmProjects/parsing_contacts/captcha /'）'返回錯誤 –

我不知道如何你得到captcha，我做了> 150請求頁面'https：// moskva.tiu.ru/Tehnicheskie-moyuschie-sredstva'沒有暫停，並沒有抓住captcha頁面。你確定，你真的需要驗證碼解析嗎？=） –

你在'rllib.request.urlretrieve（）'中傳遞的'page'的值是什麼？ –

Python：從html獲取CATCHA文件

回答

相關問題