2013-02-05 58 views
0

當使用urlgrabber時,推薦處理Content-Encoding: gzip文件的方法是什麼?urlgrabber與gzip支持

現在,我的猴子打補丁這樣的:

g = URLGrabber(http_headers=(("Accept-Encoding", "gzip"),)) 
g.is_compressed = False # I don't know yet if the server will send me compressed data 

# Backup current method of handling downloaded headers 
try: 
    PyCurlFileObject.orig_hdr_retrieve 
except AttributeError: 
    PyCurlFileObject.orig_hdr_retrieve = PyCurlFileObject._hdr_retrieve 

def hdr_retrieve(instance, buf): 
    r = PyCurlFileObject.orig_hdr_retrieve(instance, buf) 
    if "content-encoding" in buf.lower() and "zip" in buf.lower(): 
     g.is_compressed = True 
    return r 
PyCurlFileObject._hdr_retrieve = hdr_retrieve 

g.urlgrab(url, dest) 

if g.is_compressed: 
    # ungzip file here 

但它看起來並不很乾淨,我擔心它不是線程要麼...

回答

0

我想我已經發現了一個線程安全的解決方案:

g = URLGrabber((http_headers=(("Accept-Encoding", "gzip"),))) 
g.opts._set_attributes(grabber=g) 
try: 
    PyCurlFileObject.orig_setopts 
except AttributeError: 
    PyCurlFileObject.orig_setopts = PyCurlFileObject._set_opts 

    def setopts(instance, opts={}): 
     PyCurlFileObject.orig_setopts(instance, opts) 
     grabber = instance.opts.grabber 
     grabber.is_compressed = False 

     def hdr_retrieve(buf): 
      r = PyCurlFileObject._hdr_retrieve(instance, buf) 
      if "content-encoding" in buf.lower() and "zip" in buf.lower(): 
       grabber.is_compressed = True 
      return r 

     instance.curl_obj.setopt(pycurl.HEADERFUNCTION, hdr_retrieve) 
    PyCurlFileObject._set_opts = setopts 

,但它仍然沒有感到很「乾淨」 :)