我正在使用Python 3.3.1。我創建了一個名爲download_file()
的函數,該函數下載文件並將其保存到磁盤。爲什麼不下載文本文件正常工作?
#!/usr/bin/python3
# -*- coding: utf8 -*-
import datetime
import os
import urllib.error
import urllib.request
def download_file(*urls, download_location=os.getcwd(), debugging=False):
"""Downloads the files provided as multiple url arguments.
Provide the url for files to be downloaded as strings. Separate the
files to be downloaded by a comma.
The function would download the files and save it in the folder
provided as keyword-argument for download_location. If
download_location is not provided, then the file would be saved in
the current working directory. Folder for download_location would be
created if it doesn't already exist. Do not worry about trailing
slash at the end for download_location. The code would take carry of
it for you.
If the download encounters an error it would alert about it and
provide the information about the Error Code and Error Reason (if
received from the server).
Normal Usage:
>>> download_file('http://localhost/index.html',
'http://localhost/info.php')
>>> download_file('http://localhost/index.html',
'http://localhost/info.php',
download_location='/home/aditya/Download/test')
>>> download_file('http://localhost/index.html',
'http://localhost/info.php',
download_location='/home/aditya/Download/test/')
In Debug Mode, files are not downloaded, neither there is any
attempt to establish the connection with the server. It just prints
out the filename and its url that would have been attempted to be
downloaded in Normal Mode.
By Default, Debug Mode is inactive. In order to activate it, we
need to supply a keyword-argument as 'debugging=True', like:
>>> download_file('http://localhost/index.html',
'http://localhost/info.php',
debugging=True)
>>> download_file('http://localhost/index.html',
'http://localhost/info.php',
download_location='/home/aditya/Download/test',
debugging=True)
"""
# Append a trailing slash at the end of download_location if not
# already present
if download_location[-1] != '/':
download_location = download_location + '/'
# Create the folder for download_location if not already present
os.makedirs(download_location, exist_ok=True)
# Other variables
time_format = '%Y-%b-%d %H:%M:%S' # '2000-Jan-01 22:10:00'
# "Request Headers" information for the file to be downloaded
accept = 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8'
accept_encoding = 'gzip, deflate'
accept_language = 'en-US,en;q=0.5'
connection = 'keep-alive'
user_agent = 'Mozilla/5.0 (X11; Ubuntu; Linux i686; rv:20.0) \
Gecko/20100101 Firefox/20.0'
headers = {'Accept': accept,
'Accept-Encoding': accept_encoding,
'Accept-Language': accept_language,
'Connection': connection,
'User-Agent': user_agent,
}
# Loop through all the files to be downloaded
for url in urls:
filename = os.path.basename(url)
if not debugging:
try:
request_sent = urllib.request.Request(url, None, headers)
response_received = urllib.request.urlopen(request_sent)
except urllib.error.URLError as error_encountered:
print(datetime.datetime.now().strftime(time_format),
':', filename, '- The file could not be downloaded.')
if hasattr(error_encountered, 'code'):
print(' ' * 22, 'Error Code -', error_encountered.code)
if hasattr(error_encountered, 'reason'):
print(' ' * 22, 'Reason -', error_encountered.reason)
else:
read_response = response_received.read()
output_file = download_location + filename
with open(output_file, 'wb') as downloaded_file:
downloaded_file.write(read_response)
print(datetime.datetime.now().strftime(time_format),
':', filename, '- Downloaded successfully.')
else:
print(datetime.datetime.now().strftime(time_format),
': Debugging :', filename, 'would be downloaded from :\n',
' ' * 21, url)
此功能適用於下載PDF文件,圖像和其他格式,但它給文本文件如html文件帶來麻煩。我懷疑這個問題必須做一些與此行結尾:
with open(output_file, 'wb') as downloaded_file:
所以,我曾試圖wt
模式下打開它。也嘗試僅使用w
模式。但是這並不能解決問題。
另一個問題可能已經被編碼,所以我也包含第二行:
# -*- coding: utf8 -*-
但是,這仍然無法正常工作。可能是什麼問題,以及如何使它適用於文本和二進制文件?什麼不起作用
例子:
>>>download_file("http://docs.python.org/3/tutorial/index.html")
當我Gedit的打開它,它顯示爲:
在Firefox打開時同理:
究竟是什麼問題/錯誤? –
@StephaneRolland:它不會給出任何錯誤。但是,當我在文本編輯器中打開文檔時,它會報告有關編碼的問題。我會在一會兒上傳圖片.. – Aditya
哪個文本編輯器? –