2010-08-20 127 views
6

我試圖使用此代碼下載zip文件:的Python:如何下載zip文件

o = urllib2.build_opener(urllib2.HTTPCookieProcessor()) 

#login 
p = urllib.urlencode({ usernameField: usernameVal, passField: passVal }) 
f = o.open(authUrl, p) 
data = f.read() 
print data 
f.close() 

#download file 
f = o.open(remoteFileUrl) 
localFile = open(localFile, "wb") 
localFile.write(f.read()) 
f.close() 

我得到一些二進制數據,但我「下載」文件的尺寸過小而且不是有效的zip文件。我沒有正確檢索zip文件嗎?下面顯示了f = o.open(remoteFileUrl)的HTTP響應標頭。我不知道是否需要特殊處理,這樣的響應:

HTTP/1.1 200 OK服務器:
Apache的狼/ 1.1附註:私人
的Cache-Control:必須-重新驗證
過期:星期二,1997年12月31日23:59:59 GMT
內容處置:內聯;
filename =「files.zip」;
內容類型:應用程序/壓縮
傳輸編碼:分塊

回答

10

f.read()不一定讀取整個文件,但只是其中的一個包(這可能是整個文件,如果是小,但將不會用於大文件)。

您需要循環這樣的分組:

while 1: 
    packet = f.read() 
    if not packet: 
     break 
    localFile.write(packet) 
f.close() 

f.read()返回一個空包,以表明您已經閱讀整個文件。

+2

我會好奇,在文檔中的你發現這個 – 2010-08-20 17:50:03

+0

http://docs.python.org/library/urllib.html#urllib.urlopen:「一個類似文件的對象返回」,然後http://docs.python.org/library/stdtypes.html#file .read – RichieHindle 2010-08-23 08:12:14

+0

真的只是一個包?我在顯示的鏈接處檢查了文檔,並且沒有看到它說任何read()直到EOF纔讀取。你能解釋更多嗎? – 2011-06-07 21:48:01

1

如果你不介意讀取整個壓縮文件到內存中,以最快的方式讀取和寫入其計算方法如下:

data = f.readlines() 
with open(localFile,'wb') as output: 
    output.writelines(data) 

否則,閱讀,當你在他們寫塊通過網絡,做

with open(localFile, "wb") as output: 
    chunk = f.read() 
    while chunk: 
     output.write(chunk) 
     chunk = f.read() 

這是一個不太整潔,但避免了整個文件一次保存在內存中。希望能幫助到你。

0

試試這個:

#download file 
f = o.open(remoteFileUrl) 

response = "" 
while 1: 
    data = f.read() 
    if not data: 
     break 
    response += data 

with open(localFile, "wb") as local_file: 
    local_file.write(response) 
1

下面是一個使用的urllib2下載以塊的文件和打印下載的狀態更強大的解決方案

import os 
import urllib2 
import math 

def downloadChunks(url): 
    """Helper to download large files 
     the only arg is a url 
     this file will go to a temp directory 
     the file will also be downloaded 
     in chunks and print out how much remains 
    """ 

    baseFile = os.path.basename(url) 

    #move the file to a more uniq path 
    os.umask(0002) 
    temp_path = "/tmp/" 
    try: 
     file = os.path.join(temp_path,baseFile) 

     req = urllib2.urlopen(url) 
     total_size = int(req.info().getheader('Content-Length').strip()) 
     downloaded = 0 
     CHUNK = 256 * 10240 
     with open(file, 'wb') as fp: 
      while True: 
       chunk = req.read(CHUNK) 
       downloaded += len(chunk) 
       print math.floor((downloaded/total_size) * 100) 
       if not chunk: break 
       fp.write(chunk) 
    except urllib2.HTTPError, e: 
     print "HTTP Error:",e.code , url 
     return False 
    except urllib2.URLError, e: 
     print "URL Error:",e.reason , url 
     return False 

    return file 
+0

只有在處理沒有發送「Content-Lenght」標題的情況下,它纔會有效IMO – 2011-12-22 08:59:01

+0

好點Xavier – Gourneau 2011-12-23 22:15:52