2013-05-16 32 views
1

我在使用Python機械化循環下載多個文件時遇到問題。我也使用美麗的湯4.這兩個包的文檔似乎沒有答案。使用Python機械化下載循環中的文件

這是我的代碼 - 請跳到實際循環。我列入參考的一切:

import mechanize, cookielib, os, time 
from bs4 import BeautifulSoup 


fcList = ['abandoned mine land inventory points', 'abandoned mine land inventory polygons', \ 
      'abandoned mine land inventory sites', 'coal mining operations', 'coal pillar location-mining', \ 
      'industrial mineral mining operations', 'longwall mining panels', 'mine drainage treatment/land recycling project locations', \ 
      'mined out areas', 'residual waste operations', 'underground mining permit'] 

dlLink = 'FTP Download' 
dloadPath = 'C:\\Users\\SomeGuy\\Downloads' 

# Browser 
br = mechanize.Browser() 

# Cookie Jar 
cj = cookielib.LWPCookieJar() 
br.set_cookiejar(cj) 

# Select the first (index zero) form 
br.select_form(nr=0) 

# Input form data 
br.form['Keyword']='mining' 
br.submit() 
html = br.response().read() 

# Pass html to beautiful soup for parse 
soup = BeautifulSoup(html) 
htmlinks = soup.findAll("a") 

# Find links with desired text 
for htmlink in htmlinks: 
    string = str(htmlink.string) 
    if string.lower() in fcList: 
     print "Matched link!", string + ". attempting download...\n" 
     try: 
      req = br.click_link(text = string) 
      br.open(req) 
      print "URL: " + str(br.geturl) 
      html = br.response().read() 
      soup = BeautifulSoup(html) 
      the_tag = soup.find('a', text=dlLink) 
      fileURL = the_tag.get('href') 
      print fileURL 
      # attempt download 
      fnam = string.replace(" ", "_") 
      fnam = fnam.replace("/", "_") 
      f = br.retrieve(fileURL, os.path.join(dloadPath, fnam + ".zip")) 
      print f + "\n" 
      br.back() 
     except: 
      print "An unknown error occurred." 

輸出:

>>> 
Matched link! Abandoned Mine Land Inventory Points. attempting download... 

URL: <bound method Browser.geturl of <mechanize._mechanize.Browser instance at 0x02D9D7B0>> 
http://www.pasda.psu.edu/data/dep/AMLInventoryPoints2013_04.zip 
An unknown error occurred. 
Matched link! Abandoned Mine Land Inventory Polygons. attempting download... 

An unknown error occurred. 
Matched link! Abandoned Mine Land Inventory Sites. attempting download... 

An unknown error occurred. 
Matched link! Coal Mining Operations. attempting download... 

An unknown error occurred. 
Matched link! Coal Pillar Location-Mining. attempting download... 

An unknown error occurred. 
Matched link! Industrial Mineral Mining Operations. attempting download... 

An unknown error occurred. 
Matched link! Longwall Mining Panels. attempting download... 

An unknown error occurred. 
Matched link! Mine Drainage Treatment/Land Recycling Project Locations. attempting  download... 

An unknown error occurred. 
Matched link! Mined Out Areas. attempting download... 

An unknown error occurred. 
Matched link! Residual Waste Operations. attempting download... 

An unknown error occurred. 
Matched link! Underground Mining Permit. attempting download... 

An unknown error occurred. 
>>> 

我認爲這個問題可能是由於有下載之間沒有等待時間。無論選擇哪一個,此代碼都會成功下載循環中的第一個文件。或者也許是我不知道的其他一些錯誤 - 我昨天剛剛下載了機械化和美觀!

回答

0

試試這個:

f = br.retrieve(fileURL, os.path.join(dloadPath, fnam + ".zip"))[0] 

如果這不會工作,除去try..catch並張貼什麼實際的錯誤你得到

+0

謝謝!對不起,很長的延遲...我會盡力而爲,儘快回覆你!我認爲這永遠不會得到答覆。這是我在這裏的第一個問題,我沒有問得很好。 –