從Python 3下載文件

167

我正在創建一個程序，它將通過讀取在同一遊戲/應用程序的.jad文件中指定的URL從Web服務器下載.jar（java）文件。我正在使用Python 3.2.1從Python 3下載文件

我設法從JAD文件中提取JAR文件的URL（每個JAD文件都包含JAR文件的URL），但正如您可能想象的那樣，提取的值是type（）字符串。

下面是相關的函數：

def downloadFile(URL=None): 
    import httplib2 
    h = httplib2.Http(".cache") 
    resp, content = h.request(URL, "GET") 
    return content 

downloadFile(URL_from_file)

不過，我總是得到一個錯誤，指出上面的函數的類型必須是字節，而不是字符串。我嘗試過使用URL.encode（'utf-8'）和字節（URL，encoding ='utf-8'），但我總是得到相同或相似的錯誤。

所以基本上我的問題是如何在URL存儲在字符串類型中時從服務器下載文件？

來源

2011-08-30 Bo Milanovich

@alvas，這是一個賞金？回答者在SO上仍然（並且相當）活躍。爲什麼不直接添加評論並詢問？ –

因爲時間的考驗是一個很好的答案，值得獎勵。另外，我們應該開始做很多其他問題來檢查答案是否與今天有關。特別是當SO答案的排序相當瘋狂時，有時候過時的甚至是最差的答案會達到頂峯。 – alvas

353

如果你想獲得一個網頁內容到一個變量中，urllib.request.urlopen只是read響應：

import urllib.request 
... 
url = 'http://example.com/' 
response = urllib.request.urlopen(url) 
data = response.read()  # a `bytes` object 
text = data.decode('utf-8') # a `str`; this step can't be used if data is binary

下載和保存文件的最簡單方法是使用urllib.request.urlretrieve功能：

import urllib.request 
... 
# Download the file from `url` and save it locally under `file_name`: 
urllib.request.urlretrieve(url, file_name)

import urllib.request 
... 
# Download the file from `url`, save it in a temporary directory and get the 
# path to it (e.g. '/tmp/tmpb48zma.txt') in the `file_name` variable: 
file_name, headers = urllib.request.urlretrieve(url)

但請記住urlretrieve被認爲是legacy並可能會被棄用（不知道爲什麼，雖然）。

所以最正確辦法做到這一點是使用了urllib.request.urlopen函數返回一個代表HTTP響應，並將其複製到使用shutil.copyfileobj一個真正的文件一個文件對象。

import urllib.request 
import shutil 
... 
# Download the file from `url` and save it locally under `file_name`: 
with urllib.request.urlopen(url) as response, open(file_name, 'wb') as out_file: 
    shutil.copyfileobj(response, out_file)

如果這似乎太複雜，你可能想去簡單，存儲在bytes對象全部下載，然後將其寫入文件。但是這隻適用於小文件。

import urllib.request 
... 
# Download the file from `url` and save it locally under `file_name`: 
with urllib.request.urlopen(url) as response, open(file_name, 'wb') as out_file: 
    data = response.read() # a `bytes` object 
    out_file.write(data)

它可以在飛行中提取.gz（也許其他格式）的壓縮數據，但這樣的操作可能需要HTTP服務器支持的文件隨機訪問。

import urllib.request 
import gzip 
... 
# Read the first 64 bytes of the file inside the .gz archive located at `url` 
url = 'http://example.com/something.gz' 
with urllib.request.urlopen(url) as response: 
    with gzip.GzipFile(fileobj=response) as uncompressed: 
     file_header = uncompressed.read(64) # a `bytes` object 
     # Or do anything shown above using `uncompressed` instead of `response`.

來源

2011-08-30 13:49:09

你可以使用'response.info（）。get_param（'charset'，'utf-8'）'而不是硬編碼'utf-8'，從Content-Type頭獲得字符編碼 – jfs

@OlehPrypin爲什麼'outfile.write（data）'只適用於小文件？ – Startec

「urlretrieve被認爲是遺留問題，可能會被棄用」您是從哪裏得到這個想法的？ –

我希望我理解這個問題的權利，這就是：如何從服務器下載文件時，URL存儲在一個字符串類型？

我下載文件並將其保存在本地使用下面的代碼：

import requests 

url = 'https://www.python.org/static/img/python-logo.png' 
fileName = 'D:\Python\dwnldPythonLogo.png' 
req = requests.get(url) 
file = open(fileName, 'wb') 
for chunk in req.iter_content(100000): 
    file.write(chunk) 
file.close()

來源

2016-01-18 20:32:35

嗨，我也使用相同類型的代碼下載文件，但有時候我面臨異常 - 'charmap'編解碼器無法編碼字符'\ u010c'.....你能幫我用 – Joyson

我用requests包每當我想涉及到HTTP請求的東西，因爲它的API是非常容易入手：

第一，安裝requests

$ pip install requests

那麼代碼：

from requests import get # to make GET request 


def download(url, file_name): 
    # open in binary mode 
    with open(file_name, "wb") as file: 
     # get request 
     response = get(url) 
     # write to file 
     file.write(response.content)

來源

2016-01-23 14:21:13

+29

+1來提一提第一個安裝請求。這種文化應該發生在計算器和鄰居站點，因爲假設觀衆知道所有的信息都是錯誤的。 – TechJS

-2

from urllib import request 

def get(url): 
    with request.urlopen(url) as r: 
     return r.read() 


def download(url, file=None): 
    if not file: 
     file = url.split('/')[-1] 
    with open(file, 'wb') as f: 
     f.write(get(url))

來源

2017-03-17 09:35:56 user7726287

您可以使用的wget這對於流行的下載shell工具。 https://pypi.python.org/pypi/wget 這將是最簡單的方法，因爲它不需要打開目標文件。這是一個例子。

import wget 
url = 'https://i1.wp.com/python3.codes/wp-content/uploads/2015/06/Python3-powered.png?fit=650%2C350' 
wget.download(url, '/Users/scott/Downloads/cat4.jpg')

來源

2018-01-13 19:39:14

從Python 3下載文件

回答

相關問題