2014-01-07 71 views
1

我試圖找回這需要從後面代理訪問的網頁,還需要HTTP認證:接收404 Python的請求,wget的獲取正確的頁面

$ wget -d --user=atwood --ask-password http://example.com/admin/admin.php 

這工作得很好,我會貼HTTP標頭(請求和響應在下面)。

python-requests檢索同一頁面返回404錯誤

這裏是Python代碼,這是之前由該用戶Inactivist張貼debugging the requests library該了不起方法:

url = 'http://example.com/admin/admin.php' 
proxy_config = { 
    'http': '1.2.3.4', 
    'https': '1.2.3.4', 
    'ftp': '1.2.3.4' 
} 
head = { 
    'User-Agent': 'Wget/1.13.4 (linux-gnu)', 
    'Connection': 'Close', 
    'Proxy-Connection': 'Keep-Alive' 
} 

response = requests.get(url, auth=('atwood', 'hunter2'), proxies=proxy_config, headers=head) 

print("Status code: %s" % (response.status_code,)) 
print("URL: %s" % (response.url,)) 
print(pformat(response.text)) 

以下是wget HTTP頭(請求和響應),其中確實正確返回請求的頁面

$ export http_proxy=http://1.2.3.4:3128 
$ wget -d --user=atwood --ask-password http://example.com/admin/admin.php 
Setting --user (user) to atwood 
Setting --ask-password (askpassword) to 1 
Password for user `atwood': 
DEBUG output created by Wget 1.13.4 on linux-gnu. 

URI encoding = `UTF-8' 
URI encoding = `UTF-8' 
--2014-01-07 11:15:59-- http://example.com/admin/admin.php 
Host `example.com' has not issued a general basic challenge. 
Connecting to 1.2.3.4:3128... connected. 
Created socket 3. 
Releasing 0x000000000159bf20 (new refcount 0). 
Deleting unused 0x000000000159bf20. 

---request begin--- 
GET http://example.com/admin/admin.php HTTP/1.1 
User-Agent: Wget/1.13.4 (linux-gnu) 
Accept: */* 
Host: example.com 
Connection: Close 
Proxy-Connection: Keep-Alive 

---request end--- 
Proxy request sent, awaiting response... 
---response begin--- 
HTTP/1.0 401 Unauthorized 
Date: Tue, 07 Jan 2014 09:16:00 GMT 
Server: Apache/2.2.21 (Linux/SUSE) 
X-Powered-By: PHP/5.3.8 
WWW-Authenticate: Basic realm="CONTACT-ADMIN" 
Content-Length: 43 
Content-Type: text/html 
X-Cache: MISS from proxyServer 
X-Cache-Lookup: MISS from proxyServer:3128 
Via: 1.0 proxyServer (squid/3.1.19) 
Connection: keep-alive 

---response end--- 
401 Unauthorized 
Registered socket 3 for persistent reuse. 
Skipping 43 bytes of body: [Login incorrect, please try again: |||BAD| 
] done. 
Inserted `example.com' into basic_authed_hosts 
Reusing existing connection to 1.2.3.4:3128. 
Reusing fd 3. 

---request begin--- 
GET http://example.com/admin/admin.php HTTP/1.1 
User-Agent: Wget/1.13.4 (linux-gnu) 
Accept: */* 
Host: example.com 
Connection: Close 
Proxy-Connection: Keep-Alive 
Authorization: Basic NjY2Njp0cmlwczEyMw== 

---request end--- 
Proxy request sent, awaiting response... 
---response begin--- 
HTTP/1.0 200 OK 
Date: Tue, 07 Jan 2014 09:16:00 GMT 
Server: Apache/2.2.21 (Linux/SUSE) 
X-Powered-By: PHP/5.3.8 
Cache-Control: no-cache, must-revalidate 
Pragma: no-cache 
Content-Type: text/html; charset=utf-8 
X-Cache: MISS from proxyServer 
X-Cache-Lookup: MISS from proxyServer:3128 
Via: 1.0 proxyServer (squid/3.1.19) 
Connection: close 

---response end--- 
200 OK 
URI content encoding = `utf-8' 
Length: unspecified [text/html] 
Saving to: `admin.php' 

    [ <=>       ] 14,096  --.-K/s in 0.1s 

2014-01-07 11:16:00 (92.8 KB/s) - `admin.php' saved [14096] 

您可能注意到我匿名了我提取的URL。實際上,我已經三重檢查了返回404的URL實際上是與在wget中工作的URL相同的URL。

+0

代理端口的差異只是匿名URL的結果嗎? – Evert

+0

'--user = atwood' ...我看到你在那裏做了什麼。 – DaSourcerer

+0

你是對的!我沒有在Python中設置代理端口! – dotancohen

回答

1

它看起來像你在Python中的代理端口是不一樣的wget(3128與我猜的默認8080)。

+0

謝謝,就是這樣! – dotancohen