Python | HTTP - 如何在下載之前檢查文件大小

我使用urllib3來爬網。示例代碼：Python | HTTP - 如何在下載之前檢查文件大小

from urllib3 import PoolManager 

pool = PoolManager() 
response = pool.request("GET", url)

的問題是，我可以在URL，它是一個真正的大文件的下載，我不是在下載它interseted絆倒。

我發現這個問題 - Link - 它建議使用urllib和urlopen。我不想聯繫服務器兩次。

我想限制文件大小爲25MB。有沒有辦法可以做到這一點urllib3？

來源

2016-11-14 Danny Hambourg

閱讀直到你點擊25MB，然後取消下載？ – jarmod

這是一個選項。我怎樣才能做到這一點？ –

您可以使用HTTP HEAD謂詞並讀取Content-Length標題來檢索大小。如果服務器省略了Content-Length，除非像jarmod提到的那樣開始下載文件，否則無法檢查大小。 –

如果服務器提供了一個Content-Length標題，那麼您可以使用它來確定是否要繼續下載正文的其餘部分。如果服務器沒有提供標題，那麼您需要傳輸響應，直到您決定不再繼續。

要做到這一點，你需要確保你是not preloading the full response。

from urllib3 import PoolManager 

pool = PoolManager() 
response = pool.request("GET", url, preload_content=False) 

# Maximum amount we want to read 
max_bytes = 1000000 

content_bytes = response.headers.get("Content-Length") 
if content_bytes and int(content_bytes) < max_bytes: 
    # Expected body is smaller than our maximum, read the whole thing 
    data = response.read() 
    # Do something with data 
    ... 
elif content_bytes is None: 
    # Alternatively, stream until we hit our limit 
    amount_read = 0 
    for chunk in r.stream(): 
     amount_read += len(chunk) 
     # Save chunk 
     ... 
     if amount_read > max_bytes: 
      break 

# Release the connection back into the pool 
response.release_conn()

來源

2016-11-14 18:38:18 shazow

我還打開了一個問題來改進我們針對此場景的文檔，請添加任何有用或有用的附加註釋：https：//github.com/shazow/urllib3/issues/1037 – shazow

快速問題：因爲您不關閉連接並將其釋放到池中，下一個請求是否會恢復下載並因爲無法識別HTTP響應而中斷？難道它不應該被強制關閉？ – spectras

@spectras老實說，我並不是100％確定會發生什麼事情，但如果它確實無法恢復連接，那麼我會認爲它是urllib3中的一個錯誤，並要求您報告。 :)我很確定我們在重新使用連接之前進行檢查。 – shazow

Python | HTTP - 如何在下載之前檢查文件大小

回答

相關問題