我正在學習Python自動化鏜孔的東西。這個程序應該去http://xkcd.com/和下載所有的圖像離線查看。使用requests.get沒有提供架構和其他錯誤()
我在版本2.7和Mac上。
出於某種原因,我收到錯誤,如「沒有架構提供」和使用request.get()本身的錯誤。
這裏是我的代碼:
# Saves the XKCD comic page for offline read
import requests, os, bs4, shutil
url = 'http://xkcd.com/'
if os.path.isdir('xkcd') == True: # If xkcd folder already exists
shutil.rmtree('xkcd') # delete it
else: # otherwise
os.makedirs('xkcd') # Creates xkcd foulder.
while not url.endswith('#'): # If there are no more posts, it url will endswith #, exist while loop
# Download the page
print 'Downloading %s page...' % url
res = requests.get(url) # Get the page
res.raise_for_status() # Check for errors
soup = bs4.BeautifulSoup(res.text) # Dowload the page
# Find the URL of the comic image
comicElem = soup.select('#comic img') # Any #comic img it finds will be saved as a list in comicElem
if comicElem == []: # if the list is empty
print 'Couldn\'t find the image!'
else:
comicUrl = comicElem[0].get('src') # Get the first index in comicElem (the image) and save to
# comicUrl
# Download the image
print 'Downloading the %s image...' % (comicUrl)
res = requests.get(comicUrl) # Get the image. Getting something will always use requests.get()
res.raise_for_status() # Check for errors
# Save image to ./xkcd
imageFile = open(os.path.join('xkcd', os.path.basename(comicUrl)), 'wb')
for chunk in res.iter_content(10000):
imageFile.write(chunk)
imageFile.close()
# Get the Prev btn's URL
prevLink = soup.select('a[rel="prev"]')[0]
# The Previous button is first <a rel="prev" href="/1535/" accesskey="p">< Prev</a>
url = 'http://xkcd.com/' + prevLink.get('href')
# adds /1535/ to http://xkcd.com/
print 'Done!'
下面是錯誤:
Traceback (most recent call last):
File "/Users/XKCD.py", line 30, in <module>
res = requests.get(comicUrl) # Get the image. Getting something will always use requests.get()
File "/Library/Python/2.7/site-packages/requests/api.py", line 69, in get
return request('get', url, params=params, **kwargs)
File "/Library/Python/2.7/site-packages/requests/api.py", line 50, in request
response = session.request(method=method, url=url, **kwargs)
File "/Library/Python/2.7/site-packages/requests/sessions.py", line 451, in request
prep = self.prepare_request(req)
File "/Library/Python/2.7/site-packages/requests/sessions.py", line 382, in prepare_request
hooks=merge_hooks(request.hooks, self.hooks),
File "/Library/Python/2.7/site-packages/requests/models.py", line 304, in prepare
self.prepare_url(url, params)
File "/Library/Python/2.7/site-packages/requests/models.py", line 362, in prepare_url
to_native_string(url, 'utf8')))
requests.exceptions.MissingSchema: Invalid URL '//imgs.xkcd.com/comics/the_martian.png': No schema supplied. Perhaps you meant http:////imgs.xkcd.com/comics/the_martian.png?
的事情是我一直在閱讀書中有關程序多次的部分,讀取請求doc,以及在這裏查看其他問題。我的語法看起來不錯。
感謝您的幫助!
編輯:
這不起作用:comicUrl =( 「HTTP:」 + comicElem [0]獲得( 'SRC'))我想加入的http:之前將擺脫沒有模式的提供的錯誤。
HTTPS://gist.github。com/auscompgeek/5218149 – Ajay
它使用urllib2,看起來很長且複雜一如既往:D –
http://paste.ofcode.org/ZdXRAmTv3t9q9gYtv9eVDN – Ajay