2013-04-16 100 views
1

我需要訪問瀏覽器發送給web.py服務器的原始http請求。web.py:獲取原始web請求

例如,這是鉻發出請求,當我瀏覽一些頁面:

$ nc -l 8081 
GET/HTTP/1.1 
Host: 127.0.0.1:8081 
Connection: keep-alive 
Cache-Control: max-age=0 
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8 
User-Agent: Mozilla/5.0 (X11; Linux i686) AppleWebKit/537.22 (KHTML, like Gecko) Ubuntu Chromium/25.0.1364.160 Chrome/25.0.1364.160 Safari/537.22 
Accept-Encoding: gzip,deflate,sdch 
Accept-Language: it-IT,it;q=0.8,en-US;q=0.6,en;q=0.4 
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.3 

我試圖讓從web.ctx.env,但是這是一個字典(雖然我更喜歡原來的原始文本請求),並它與一些其它數據混合:

SERVER_SOFTWARE: CherryPy/3.2.0 Server 
SCRIPT_NAME: 
ACTUAL_SERVER_PROTOCOL: HTTP/1.1 
REQUEST_METHOD: GET 
PATH_INFO:/
SERVER_PROTOCOL: HTTP/1.1 
QUERY_STRING: 
HTTP_ACCEPT_CHARSET: ISO-8859-1,utf-8;q=0.7,*;q=0.3 
HTTP_USER_AGENT: Mozilla/5.0 (X11; Linux i686) AppleWebKit/537.22 (KHTML, like Gecko) Ubuntu Chromium/25.0.1364.160 Chrome/25.0.1364.160 Safari/537.22 
HTTP_CONNECTION: keep-alive 
REMOTE_PORT: 55409 
SERVER_NAME: localhost 
REMOTE_ADDR: 127.0.0.1 
wsgi.url_scheme: http 
SERVER_PORT: 8081 
wsgi.input: <web.wsgiserver.KnownLengthRFile object at 0x940b16c> 
HTTP_HOST: 127.0.0.1:8081 
wsgi.multithread: True 
REQUEST_URI:/
HTTP_ACCEPT: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8 
wsgi.version: (1, 0) 
wsgi.run_once: False 
wsgi.errors: <open file '<stderr>', mode 'w' at 0xb73010d0> 
wsgi.multiprocess: False 
HTTP_ACCEPT_LANGUAGE: it-IT,it;q=0.8,en-US;q=0.6,en;q=0.4 
HTTP_ACCEPT_ENCODING: gzip,deflate,sdch 

這是我用於獲得上面的輸出的代碼:

#!/usr/bin/env python 

import web 

urls = ('(.*)', 'urlhandler') 

class urlhandler: 
    def GET(self, url): 
    txt = "" 
    for k, v in web.ctx.env.items(): 
     txt += ": ".join([k, str(v)]) + "\n" 
    return txt 

if __name__ == '__main__': 
    app = web.application(urls, globals()) 
    app.run() 

我應該從不需要的數據中清除這本詞典,還是有直接的方法來獲取原始請求?

回答

2

繼Andrey的建議我出來了這個代碼。它試圖重建網絡請求,也許這不是獲得它的最好方式,但這是我直到現在才發現的唯一途徑。

這個程序將顯示請求的頁面的Web請求(它同時適用於POST和GET請求):

#!/usr/bin/env python 

import web 
from urllib import quote 

urls = ('(.*)', 'urlhandler') 

def adaptHeader(txt): 
    """Input: string, header name as it is in web.ctx.env 
    Output: string, header name according to http protocol. 
    es: "HTTP_CACHE_CONTROL" => "Cache-Control" 
    """ 
    txt = txt.replace('HTTP_', '') 
    return '-'.join((t[0] + t[1:].lower() for t in txt.split('_'))) 

def rawRequest(env): 
    """Reconstruct and return the web request based on web.ctx.env""" 

    # url reconstruction 
    # see http://www.python.org/dev/peps/pep-0333/#url-reconstruction 
    url = env['wsgi.url_scheme']+'://' # http/https 
    url += env.get('HTTP_HOST') or (env['SERVER_NAME']+':'+env['SERVER_PORT']) # host + port 
    url += quote(env.get('SCRIPT_NAME', '')) 
    url += quote(env.get('PATH_INFO', '')) 
    url += ('?' + env['QUERY_STRING']) if env.get('QUERY_STRING') else '' # GET querystring 

    # get/post request 
    req = ' '.join((env['REQUEST_METHOD'], url, env['SERVER_PROTOCOL'])) + '\n' 

    # headers 
    for k, v in env.items(): 
    if k.startswith('HTTP') or k in ('CONTENT_TYPE', 'CONTENT_LENGTH'): 
     req += adaptHeader(k) + ': ' + str(v) + '\n' 

    # post data 
    try: 
    req += '\n' + env['wsgi.input'].read(int(env['CONTENT_LENGTH'])) 
    except: 
    pass 

    return req 

class urlhandler: 
    def GET(self, url): 
    return rawRequest(web.ctx.env) 
    def POST(self, url): 
    return rawRequest(web.ctx.env) 

if __name__ == '__main__': 
    app = web.application(urls, globals()) 
    app.run() 
+1

看來你可以更簡單地計算請求的URL:'web.ctx.home + web.ctx.fullpath'。檢查這個:http://webpy.org/cookbook/ctx –

1

看看你有什麼,你可以通過以「HTTP_」開頭的鍵過濾web.ctx.env。這比獲取和解析原始請求標題更容易。

您可以檢查WSGI這裏符合規範對應於客戶端提供的HTTP 請求頭http://www.python.org/dev/peps/pep-0333/#environ-variables

HTTP_變量變量(即變量,其名稱以「HTTP_」)。 是否存在這些變量應與請求中存在或不存在合適的HTTP標頭時的 相對應。

+0

謝謝,也許這是獲得的唯一途徑。無論如何,我不必解析請求,我只是需要它。 – etuardu