使用requests.get沒有提供架構和其他錯誤（）

我正在學習Python自動化鏜孔的東西。這個程序應該去http://xkcd.com/和下載所有的圖像離線查看。使用requests.get沒有提供架構和其他錯誤（）

我在版本2.7和Mac上。

出於某種原因，我收到錯誤，如「沒有架構提供」和使用request.get（）本身的錯誤。

這裏是我的代碼：

# Saves the XKCD comic page for offline read 

import requests, os, bs4, shutil 

url = 'http://xkcd.com/' 

if os.path.isdir('xkcd') == True: # If xkcd folder already exists 
    shutil.rmtree('xkcd') # delete it 
else: # otherwise 
    os.makedirs('xkcd') # Creates xkcd foulder. 


while not url.endswith('#'): # If there are no more posts, it url will endswith #, exist while loop 
    # Download the page 
    print 'Downloading %s page...' % url 
    res = requests.get(url) # Get the page 
    res.raise_for_status() # Check for errors 

    soup = bs4.BeautifulSoup(res.text) # Dowload the page 
    # Find the URL of the comic image 
    comicElem = soup.select('#comic img') # Any #comic img it finds will be saved as a list in comicElem 
    if comicElem == []: # if the list is empty 
     print 'Couldn\'t find the image!' 
    else: 
     comicUrl = comicElem[0].get('src') # Get the first index in comicElem (the image) and save to 
     # comicUrl 

     # Download the image 
     print 'Downloading the %s image...' % (comicUrl) 
     res = requests.get(comicUrl) # Get the image. Getting something will always use requests.get() 
     res.raise_for_status() # Check for errors 

     # Save image to ./xkcd 
     imageFile = open(os.path.join('xkcd', os.path.basename(comicUrl)), 'wb') 
     for chunk in res.iter_content(10000): 
      imageFile.write(chunk) 
     imageFile.close() 
    # Get the Prev btn's URL 
    prevLink = soup.select('a[rel="prev"]')[0] 
    # The Previous button is first <a rel="prev" href="/1535/" accesskey="p">&lt; Prev</a> 
    url = 'http://xkcd.com/' + prevLink.get('href') 
    # adds /1535/ to http://xkcd.com/ 

print 'Done!'

下面是錯誤：

Traceback (most recent call last): 
    File "/Users/XKCD.py", line 30, in <module> 
    res = requests.get(comicUrl) # Get the image. Getting something will always use requests.get() 
    File "/Library/Python/2.7/site-packages/requests/api.py", line 69, in get 
    return request('get', url, params=params, **kwargs) 
    File "/Library/Python/2.7/site-packages/requests/api.py", line 50, in request 
    response = session.request(method=method, url=url, **kwargs) 
    File "/Library/Python/2.7/site-packages/requests/sessions.py", line 451, in request 
    prep = self.prepare_request(req) 
    File "/Library/Python/2.7/site-packages/requests/sessions.py", line 382, in prepare_request 
    hooks=merge_hooks(request.hooks, self.hooks), 
    File "/Library/Python/2.7/site-packages/requests/models.py", line 304, in prepare 
    self.prepare_url(url, params) 
    File "/Library/Python/2.7/site-packages/requests/models.py", line 362, in prepare_url 
    to_native_string(url, 'utf8'))) 
requests.exceptions.MissingSchema: Invalid URL '//imgs.xkcd.com/comics/the_martian.png': No schema supplied. Perhaps you meant http:////imgs.xkcd.com/comics/the_martian.png?

的事情是我一直在閱讀書中有關程序多次的部分，讀取請求doc，以及在這裏查看其他問題。我的語法看起來不錯。

感謝您的幫助！

編輯：

這不起作用：comicUrl =（「HTTP：」 + comicElem [0]獲得（ 'SRC'））我想加入的http：之前將擺脫沒有模式的提供的錯誤。

來源

2015-06-11 Loi Huynh

HTTPS：//gist.github。com/auscompgeek/5218149 – Ajay

它使用urllib2，看起來很長且複雜一如既往：D –

http://paste.ofcode.org/ZdXRAmTv3t9q9gYtv9eVDN – Ajay

改變你的comicUrl這個

comicUrl = comicElem[0].get('src').strip("http://") 
comicUrl="http://"+comicUrl 
if 'xkcd' not in comicUrl: 
    comicUrl=comicUrl[:7]+'xkcd.com/'+comicUrl[7:] 

print "comic url",comicUrl

來源

2015-06-11 02:09:39 Ajay

它工作了一下，但它卡住了在http://xkcd.com/1525/ –

錯誤是沒有模式提供的錯誤。具體來說：requests.exceptions.MissingSchema：無效的URL'http：/1525/bg.png'：沒有提供模式。也許你的意思是http：// http：/1525/bg.png？ –

@LoiHuynh這應該工作 – Ajay

沒有模式意味着你沒有提供http://或https://供應這些，它會做的伎倆。

編輯：看看這個URL字符串!:

URL '//imgs.xkcd.com/comics/the_martian.png':

來源

2015-06-11 01:56:46 John

但我沒有通過它的URL，我正在瀏覽HTML文檔並找到comicElem = soup.select（'＃comic img'）。 –

是的，但在HTML中它將使用相對URL - 請求需要絕對一個 - 試試這個：'comicUrl =「http://imgs.xkcd.com/comics/」+ comicElem [0] .get（' src'）'或者一些varian。 – John

我嘗試了Ajay的建議，與您的建議類似，我得到了shema錯誤 –

標識只是想在這裏幫腔，我有這個確切的同樣的錯誤，並用@Ajay建議上面的答案，但即使添加後，我仍然遇到問題，在程序下載第一個圖像後，它會停止並返回此錯誤：

ValueError: Unsupported or invalid CSS selector: "a[rel"

這是指程序中的最後一行，它使用「上一個按鈕」轉到下一個圖像下載。

通過BS4文檔會後無論如何，我做了一個輕微的變化如下，似乎現在工作得很好：

prevLink = soup.select('a[rel^="prev"]')[0]

別人可能會遇到同樣的問題，所以想ID添加此評論。

來源

2015-11-21 02:45:04

說明：

幾頁XKCD有特殊的內容，不是一個簡單的圖像文件。沒關係;你可以跳過這些。如果你的選擇器沒有找到任何元素，那麼soup.select（'＃comic img'）將返回一個空白列表。

工作代碼：

import requests,os,bs4,shutil 

url='http://xkcd.com' 

#making new folder 
if os.path.isdir('xkcd') == True: 
    shutil.rmtree('xkcd') 
else: 
    os.makedirs('xkcd') 


#scrapiing information 
while not url.endswith('#'): 
    print('Downloading Page %s.....' %(url)) 
    res = requests.get(url)   #getting page 
    res.raise_for_status() 
    soup = bs4.BeautifulSoup(res.text) 

    comicElem = soup.select('#comic img')  #getting img tag under comic divison 
    if comicElem == []:      #if not found print error 
     print('could not find comic image') 

    else: 
     try: 
      comicUrl = 'http:' + comicElem[0].get('src')    #getting comic url and then downloading its image 
      print('Downloading image %s.....' %(comicUrl)) 
      res = requests.get(comicUrl) 
      res.raise_for_status() 

     except requests.exceptions.MissingSchema: 
     #skip if not a normal image file 
      prev = soup.select('a[rel="prev"]')[0] 
      url = 'http://xkcd.com' + prev.get('href') 
      continue 

     imageFile = open(os.path.join('xkcd',os.path.basename(comicUrl)),'wb')  #write downloaded image to hard disk 
     for chunk in res.iter_content(10000): 
      imageFile.write(chunk) 
     imageFile.close() 

     #get previous link and update url 
     prev = soup.select('a[rel="prev"]')[0] 
     url = "http://xkcd.com" + prev.get('href') 


print('Done...')

來源

2016-12-26 10:56:49

使用requests.get沒有提供架構和其他錯誤（）

回答

相關問題