2011-08-09 104 views
1

我是python的新手,並且試圖在插座上運氣。所以我寫了一個簡單的HTTP客戶端,但讓我吃驚的是無法訪問的Firefox可以訪問網頁,但它們使用相同的標題爲什麼python腳本無法通過代理下載網頁

import socket 
clientsocket= socket.socket(socket.AF_INET, socket.SOCK_STREAM) 
clientsocket.connect(("213.229.83.205",80))#connect to proxy at given address 
print "connected to 213.229.83.205" 
sdata= """GET http://google.co.ug/ HTTP/1.1 
Host: google.co.ug 
User-Agent: Mozilla/5.0 (Windows NT 6.1; rv:6.0) Gecko/20100101 Firefox/6.0 
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8 
Accept-Language: en-us,en;q=0.5 
Accept-Encoding: gzip, deflate 
Proxy-Connection: keep-alive 
Cookie: cookie <-- Real cookie deleted 

""" 
print "sending request" 
clientsocket.send(sdata); 
rdata=clientsocket.recv(10240) 
if not rdata: print "no data found" 
else: 
    print "receiving data !" 
    myfile=open("c:/users/markdenis/desktop/google.html","w") 
    myfile.write(str(rdata)) 
    myfile.close() 
    print "data written to file on desktop" 
clientsocket.close() 
raw_input()#system(pause) 

當我運行它,它表明:

connected to 213.229.83.205 
sending request 
no data found 
+0

有是在地址上面跑 –

+0

你確定你的線之間和頭後休息是'一個glype代理\ r \ N'?它是一些服務器所需要的(大部分是我的經驗)。 – Skurmedel

+0

我可以知道你的代碼的目標,沒有使用urllib2的任何特殊原因嗎? – Kracekumar

回答

5

HTTP協議要求在每個標頭的末尾有\r\n,在HTTP標頭的末尾有一個空白行。您對sdata緩衝區中的行結尾沒有明確說明,因此緩衝區僅以\n行結束符結束。

測試在Windows,Linux和OS X,可以肯定的:

>>> x = """a 
b 
c""" 
>>> x 
'a\\nb\\nc\\n' 

,你需要:

>>> x = "a\r\nb\r\nc\r\n" 
>>> x 
'a\\r\\nb\\r\\nc\\r\\n' 

添加\r\n S和給它一個鏡頭。直接在緩衝區做這將讓你一組額外的\n,所以拆起來:

sdata = "GET http://google.co.ug/ HTTP/1.1\r\n" 
sdata += "Host: google.co.ug\r\n" 
sdata += "User-Agent: Mozilla/5.0 (Windows NT 6.1; rv:6.0) Gecko/20100101 Firefox/6.0\r\n" 
sdata += "Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8\r\n" 
sdata += "Accept-Language: en-us,en;q=0.5\r\n" 
sdata += "Accept-Encoding: gzip, deflate\r\n" 
sdata += "Proxy-Connection: keep-alive\r\n" 
sdata += "\r\n"