緩存訪問被拒絕。請求模塊中需要身份驗證

我正在嘗試製作基本的網絡爬蟲。我的互聯網是通過代理連接。所以我使用了給定的解決方案here。但仍然在運行代碼時出現錯誤。我的代碼是：緩存訪問被拒絕。請求模塊中需要身份驗證

#!/usr/bin/python3.4 
import requests 
from bs4 import BeautifulSoup 

import urllib.request as req 
proxies = { 
    "http": r"http://usr:[email protected]:3128", 
    "https": r"http://usr:[email protected]:3128", 
} 

url = input("Ask user for something") 

def santabanta(max_pages,url): 
    page = 1 
    while (page <= max_pages):  
     source_code = requests.get(url,proxies=proxies) 
     plain_text = source_code.text 
     print (plain_text) 
     soup = BeautifulSoup(plain_text,"lxml") 
     for link in soup.findAll('a'): 
      href = link.get('href') 
      print(href) 
     page = page + 1 
santabanta(1,url)

但是，儘管在Ubuntu 14.04在終端中運行我收到以下錯誤：http://www.santabanta.com/wallpapers/gauhar-khan/：

是試圖獲取URL遇到以下錯誤？

緩存訪問被拒絕。

對不起，你目前被允許請求http://www.santabanta.com/wallpapers/gauhar-khan/？從這個緩存直到你已經認證你自己。

發表我的網址是：http://www.santabanta.com/wallpapers/gauhar-khan/

請幫我

來源

2016-02-13 Kevin Pandya

打開URL。
點擊F12（鉻用戶）
現在轉到下面的菜單中的「網絡」。
點擊f5重新加載頁面，以便chrome記錄從服務器接收的所有數據。
打開任何「接收的文件」，並深入到「請求頭」
通過所有的頭request.GET中（）

[這裏是一個圖像，以幫助你] [1 ] [1]：http://i.stack.imgur.com/zUEBE.png

使頭部如下：

頭= { '接受'： '*/*'， '接受編碼'： 'gzip的，放氣，SDCH'， 'Accept-Language'：'en-US，en; q = 0.8'， 'Cache-Control'：'max-age = 0'， 'Connection'：'keep-alive'， 'Proxy-Authorization'：'Basic ZWRjZ3Vlc3Q6ZWRjZ3Vlc3Q ='， 'If-Modified-Since'：'Fri ，2015年11月13日17:47:23 GMT'， 'User-Agent'：'Mozilla/5.0（X11; Linux x86_64）AppleWebKit/537.36（KHTML，如Gecko）Chrome/48.0.2564.116 Safari/537.36' }

來源

2016-02-20 05:42:49

還有另一種解決此問題的方法。
你可以做的是讓你的Python腳本，以使用環境變量

打開終端（CTRL + ALT + T）

export http_proxy="http://usr:[email protected]:port"
export https_proxy="https://usr:[email protected]:port"

定義的代理和刪除代碼代碼
以下是更改後的代碼：

#!/usr/bin/python3.4 
import requests 
from bs4 import BeautifulSoup 

import urllib.request as req 
url = input("Ask user for something") 

def santabanta(max_pages,url): 
    page = 1 
    while (page <= max_pages):  
     source_code = requests.get(url) 
     plain_text = source_code.text 
     print (plain_text) 
     soup = BeautifulSoup(plain_text,"lxml") 
     for link in soup.findAll('a'): 
      href = link.get('href') 
      print(href) 
     page = page + 1 
santabanta(1,url)

來源

2016-08-22 12:54:02 alphaguy

緩存訪問被拒絕。請求模塊中需要身份驗證

回答

相關問題