2014-03-30 36 views
0

我試圖實現HTTP橫幅抓取。 我寫了這個:抓取HTTP的橫幅

s=socket.socket(socket.AF_INET,socket.SOCK_STREAM) 
s.settimeout(2) 
s.connect((ip_address,80)) 

byte = str.encode("Server:\r\n") 
s.send(byte) 
banner = s.recv(1024) 
print(banner) 

應該打印Bad request按摩有關服務器的詳細信息,而是它打印我的瀏覽器的HTML。

+0

相關問題:如何在Python中發送HEAD HTTP請求?(http:// stackoverflow。com/q/107405/4279) – jfs

回答

2

當http web服務器從client接收到HTTP方法(例如Server:\r\n)且對web服務器沒有意義時,它可能會返回包含標頭和內容的響應。

4xx Client Error

的4XX類的狀態代碼是用於在客戶端 似乎出錯的情況。除了響應HEAD請求之外,服務器應該包含一個實體,其中包含對錯誤 情況的解釋,以及它是臨時還是永久性條件。這些 狀態碼適用於任何請求方法。用戶代理應該 向用戶顯示任何包含的實體。

因此,如果您只想要標題部分爲grabbing banner,請發送HTTP HEAD請求。

下面是一個例子:

import socket 


def http_banner_grabber(ip, port=80, method="HEAD", 
         timeout=60, http_type="HTTP/1.1"): 
    assert method in ['GET', 'HEAD'] 
    # @see: http://stackoverflow.com/q/246859/538284 
    assert http_type in ['HTTP/0.9', "HTTP/1.0", 'HTTP/1.1'] 
    cr_lf = '\r\n' 
    lf_lf = '\n\n' 
    crlf_crlf = cr_lf + cr_lf 
    res_sep = '' 
    # how much read from buffer socket in every read 
    rec_chunk = 4096 
    s = socket.socket() 
    s.settimeout(timeout) 
    s.connect((ip, port)) 
    # the req_data is like 'HEAD HTTP/1.1 \r\n' 
    req_data = "{}/{}{}".format(method, http_type, cr_lf) 
    # if is a HTTP 1.1 protocol request, 
    if http_type == "HTTP/1.1": 
     # then we need to send Host header (we send ip instead of host here!) 
     # adding host header to req_data like 'Host: google.com:80\r\n' 
     req_data += 'Host: {}:{}{}'.format(ip, port, cr_lf) 
     # set connection header to close for HTTP 1.1 
     # adding connection header to req_data like 'Connection: close\r\n' 
     req_data += "Connection: close{}".format(cr_lf) 
    # headers join together with `\r\n` and ends with `\r\n\r\n` 
    # adding '\r\n' to end of req_data 
    req_data += cr_lf 
    # the s.send() method may send only partial content. 
    # so we used s.sendall() 
    s.sendall(req_data.encode()) 
    res_data = b'' 
    # default maximum header response is different in web servers: 4k, 8k, 16k 
    # @see: http://stackoverflow.com/a/8623061/538284 
    # the s.recv(n) method may receive less than n bytes, 
    # so we used it in while. 
    while 1: 
     try: 
      chunk = s.recv(rec_chunk) 
      res_data += chunk 
     except socket.error: 
      break 
     if not chunk: 
      break 
    if res_data: 
     # decode `res_data` after reading all content of data buffer 
     res_data = res_data.decode() 
    else: 
     return '', '' 
    # detect header and body separated that is '\r\n\r\n' or '\n\n' 
    if crlf_crlf in res_data: 
     res_sep = crlf_crlf 
    elif lf_lf in res_data: 
     res_sep = lf_lf 
    # for under HTTP/1.0 request type for servers doesn't support it 
    # and servers send just send body without header ! 
    if res_sep not in [crlf_crlf, lf_lf] or res_data.startswith('<'): 
     return '', res_data 
    # split header and data section from 
    # `HEADER\r\n\r\nBODY` response or `HEADER\n\nBODY` response 
    content = res_data.split(res_sep) 
    banner, body = "".join(content[:1]), "".join(content[1:]) 
    return banner, body 

演示:

addresses = {'google.com': '216.239.32.20', 
      'msdn.microsoft.com': '157.56.148.19', 
} 

for domain, ip in addresses.items(): 
    banner, body = http_banner_grabber(ip) 
    print('*' * 24) 
    print(domain, ip, 'HEAD HTTP/1.1') 
    print(banner) 

你也可以用GET方法嘗試一下,還有其他選項:

for domain, ip in addresses.items(): 
    banner, body = http_banner_grabber(ip, method="GET", http_type='HTTP/0.9') 
    print('*' * 24) 
    print(domain, ip, 'GET HTTP/0.9') 
    print(banner) 

輸出(第一個例子) :

************************ 
google.com 216.239.32.20 HEAD HTTP/1.1 
HTTP/1.1 200 OK 
Date: Mon, 31 Mar 2014 01:25:53 GMT 
Expires: -1 
Cache-Control: private, max-age=0 
Content-Type: text/html; charset=ISO-8859-1 
Set-Cookie: **** it was to long line and removed **** 
P3P: **** it was to long line and removed **** 
Server: gws 
X-XSS-Protection: 1; mode=block 
X-Frame-Options: SAMEORIGIN 
Connection: close 
************************ 
msdn.microsoft.com 157.56.148.19 HEAD HTTP/1.1 
HTTP/1.1 301 Moved Permanently 
Content-Length: 0 
Location: http://157.56.148.19/en-us/default.aspx 
Server: Microsoft-IIS/8.0 
P3P: **** it was to long line and removed **** 
X-Powered-By: ASP.NET 
X-Instance: CH104 
Date: Mon, 31 Mar 2014 01:25:53 GMT 
Connection: close 

輸出(第二個例子):

msdn.microsoft.com 157.56.148.19 GET HTTP/0.9 
HTTP/1.1 400 Bad Request 
Content-Type: text/html; charset=us-ascii 
Server: Microsoft-HTTPAPI/2.0 
Date: Mon, 31 Mar 2014 01:27:13 GMT 
Connection: close 
Content-Length: 311 
************************ 
google.com 216.239.32.20 GET HTTP/0.9 
HTTP/1.0 400 Bad Request 
Content-Type: text/html; charset=UTF-8 
Content-Length: 1419 
Date: Mon, 31 Mar 2014 01:27:14 GMT 
Server: GFE/2.0 

現在,如果你在兩點式我們的例子看看msdn.microsoft.comServer頭和google.com,通過這個工具,我們能夠發現一個新的東西:

  • 對於HTTP 1.1請求google.comServergwsHTTP 0.9請求,Server更改爲GFE/2.0

  • 而對於HTTP 1.1請求msdn.microsoft.comServer
    Microsoft-IIS/8.0HTTP 0.9請求,Server改變爲 Microsoft-HTTPAPI/2.0

+0

使用'httplib'做HEAD請求可能更簡單。 – jfs

+0

謝謝。真的幫助 – user3371183

+0

@ J.F.Sebastian,是的,也可以使用'httplib',但'socket'更具概念性。謝謝,你的建議適用。 –