嘗試下載文件時出現http錯誤400 urllib2

這是事情，我正在做一個腳本，從不同的站點下載文件。事情是，我無法弄清楚爲什麼它拋出我這個錯誤，而如果我把我的瀏覽器相同的網址，它讓我下載文件。還有其他的網站工作正常。所以......這裏是代碼：嘗試下載文件時出現http錯誤400 urllib2

import os 
from bs4 import BeautifulSoup 
import time 
import urllib2 

f = urllib2.Request(url) 
f.add_header('User-Agent', 'Mozilla/5.0 Windows NT 6.3; WOW64; rv:34.0') 
request = urllib2.urlopen(f) 
data = request.read() 
soup = BeautifulSoup(data, 'html.parser') 
p_name = soup.find('h2', id="searchResults").contents[0] 
if not os.path.exists(p_name): 
    os.makedirs(p_name) 
for a in soup.find_all('a', href="#register"): 
    f = a["data-durl"] 
#Following two lines just prepares file name 
    n = len(f.split("/")) 
    n_file = f.split("/")[n-1] 
    path_file = p_name+"\\"+n_file 
    if os.path.isfile(path_file): 
     print "Firmware already downloaded. skipping it" 
    else: 
     print "Downloading "+ path_file 
     link = urllib2.urlopen(f) 
     datos = link.read() 
#print "[+] Downloading firmware %s" % n_file 
#n_archivo = "Archivo"+str(b)+".zip" 
     with open(path_file, "wb") as code: 
      code.write(datos) 
    time.sleep(2)

這個網址就是不會用這個腳本工作：Non working url 但是這一個正常工作working url

希望你能幫助我。

編輯：我添加了我用於此的庫。和堆棧跟蹤我發現錯誤！問題是它試圖下載的文件名稱上的空格。使用f.replace（「」，「％20」）應該可以正常工作:)

來源

2016-02-22 Ctrl 4

這兩個網址都適合我。你從哪裏得到錯誤？請發佈完整的堆棧跟蹤。 – Selcuk

您需要將文件名中的空格轉換爲空格的URL編碼：%20。要做到這一點，您可以使用str.replace()添加這兩條線之間的一條線：

http://www.downloads.netgear.com/files/GDC/ME101/ME101%20Software%20Utility%20Version%202.0.zip

，而不是從

http://www.downloads.netgear.com/files/GDC/ME101/ME101 Software Utility Version 2.0.zip

這是無效的：

print "Downloading "+ path_file 
f = f.replace(' ', '%20') 
link = urllib2.urlopen(f)

這將從URL下載因爲它包含空格。

此URL仍然適用於您的瀏覽器，因爲當您使用空格輸入網址時，瀏覽器會自動將它們轉換爲%20。

來源

2016-02-22 15:49:41 wpercy

沒問題！快樂狩獵！ – wpercy

嘗試下載文件時出現http錯誤400 urllib2

回答

相關問題