2011-11-09 59 views
3

我使用Python 2.6.5,我試圖捕獲通過HTTP發送的原始http請求,這工作正常,除了當我添加一個代理處理程序混入所以情況是如下:通過urllib2 Python原始http請求檢索問題

  • HTTP和HTTPS請求做工精細沒有代理處理程序:原始的HTTP請求捕獲
  • HTTP請求正常工作與代理處理程序:代理確定,原始的HTTP請求捕獲
  • HTTPS請求失敗,代理處理程序:代理正常但未捕獲原始HTTP請求!

下面的問題是接近,但不解決我的問題:

這是我在做什麼:

class MyHTTPConnection(httplib.HTTPConnection): 
    def send(self, s): 
      global RawRequest 
      RawRequest = s # Saving to global variable for Requester class to see 
      httplib.HTTPConnection.send(self, s) 

class MyHTTPHandler(urllib2.HTTPHandler): 
    def http_open(self, req): 
      return self.do_open(MyHTTPConnection, req) 

class MyHTTPSConnection(httplib.HTTPSConnection): 
    def send(self, s): 
      global RawRequest 
      RawRequest = s # Saving to global variable for Requester class to see 
      httplib.HTTPSConnection.send(self, s) 

class MyHTTPSHandler(urllib2.HTTPSHandler): 
    def https_open(self, req): 
      return self.do_open(MyHTTPSConnection, req) 

委託類:

global RawRequest 
ProxyConf = { 'http':'http://127.0.0.1:8080', 'https':'http://127.0.0.1:8080' } 
# If ProxyConf = { 'http':'http://127.0.0.1:8080' }, then Raw HTTPS request captured BUT the proxy does not see the HTTPS request! 
# Also tried with similar results:  ProxyConf = { 'http':'http://127.0.0.1:8080', 'https':'https://127.0.0.1:8080' } 
ProxyHandler = urllib2.ProxyHandler(ProxyConf) 
urllib2.install_opener(urllib2.build_opener(ProxyHandler, MyHTTPHandler, MyHTTPSHandler)) 
urllib2.Request('http://www.google.com', None) # global RawRequest updated 
# This is the problem: global RawRequest NOT updated!? 
urllib2.Request('https://accounts.google.com', None) 

,但如果我刪除ProxyHandler它的工作原理!:

global RawRequest 
urllib2.install_opener(urllib2.build_opener(MyHTTPHandler, MyHTTPSHandler)) 
urllib2.Request('http://www.google.com', None) # global RawRequest updated 
urllib2.Request('https://accounts.google.com', None) # global RawRequest updated 

我如何加入ProxyHandler加入混合,同時保持對RawRequest的訪問?

預先感謝您。

+0

如果您確定自己有答案,請將其作爲答案而不是評論發佈。 –

+0

好點喬納森,只是將評論移到答案部分。乾杯。 –

回答

1

回答我自己的問題:這似乎是底層庫中的一個bug,使得RawRequest列表可以解決問題:HTTP Raw請求是第一項。自定義HTTPS類被多次調用,最後一個爲空。該自定義HTTP類只調用的事實一旦表明,這是Python中的錯誤,但該列表的解決方案得到周圍

RawRequest = s 

只需改爲:

RawRequest.append(s) 

與以前的初始化通過RawRequest[0](列表的第一個元素)RawRequest = []和檢索原始請求