與蟒蛇下載zip文件機械化

我使用Python 2.7，機械化和beautifulsoup，如果它可以幫助我可以使用的urllib與蟒蛇下載zip文件機械化

好吧，我想下載都在不同的HTML表格一對夫婦不同的zip文件。我知道特定的文件是什麼表中（我知道他們是在第一，第二，第三......表）
這裏是從網頁的HTML格式的第二個表：

<table class="fe-form" cellpadding="0" cellspacing="0" border="0" width="50%"> 
      <tr> 
       <td colspan="2"><h2>Eligibility List</h2></td> 
      </tr> 


      <tr> 
       <td><b>Eligibility File for Met-Ed</b> - 
       <a href="/content/fecorp/supplierservices/eligibility_list.suppliereligibility.html?id=ME&ftype=1&fname=cmb_me_elig_lst_06_2013.zip">cmb_me_elig_lst_06_2013.zip</td> 
      </tr> 



      <tr> 
       <td><b>Eligibility File for Penelec</b> - 
       <a href="/content/fecorp/supplierservices/eligibility_list.suppliereligibility.html?id=PN&ftype=1&fname=cmb_pn_elig_lst_06_2013.zip">cmb_pn_elig_lst_06_2013.zip</td> 
      </tr> 



      <tr> 
       <td><b>Eligibility File for Penn Power</b> - 
       <a href="/content/fecorp/supplierservices/eligibility_list.suppliereligibility.html?id=PP&ftype=1&fname=cmb_pennelig_06_2013.zip">cmb_pennelig_06_2013.zip</td> 
      </tr> 



      <tr> 
       <td><b>Eligibility File for West Penn Power</b> - 
       <a href="/content/fecorp/supplierservices/eligibility_list.suppliereligibility.html?id=WP&ftype=1&fname=cmb_wp_elig_lst_06_2013.zip">cmb_wp_elig_lst_06_2013.zip</td> 
      </tr> 


      <tr> 
       <td>&nbsp;</td> 
      </tr> 
     </table>

我打算用下面的代碼只是爲了讓到第2表：

from bs4 import BeautifulSoup 
html= br.response().read() 
soup = BeautifulSoup(html) 
table = soup.find("table", class=fe-form)

我猜那類=「FE-表」是錯誤的，因爲它不會工作，但也有沒有表的其它屬性區別於其他表格。所有表都有cellpadding =「0」cellspacing =「0」border =「0」width =「50％」。我想我不能使用find（）函數。

所以我試圖去第二個表，然後下載這個頁面上的文件。有人能給我一些信息來推動我朝着正確的方向前進。我以前使用過表格，但沒有使用表格。我希望有一些方法來找到找到zip文件的特定標題我要找然後下載它們，因爲我永遠知道他們的名字

感謝您的幫助，湯姆

來源

2013-07-15 user1087809

這個問題呢？我想用python機械化下載一個zip文件。該zip文件不是以表格形式存在的。任何人都可以給我提示，學習如何做到這一點？我一直在谷歌搜索有關使用Python機械化瀏覽表的信息，找不到任何東西。我在正確的軌道上嗎？ – user1087809

要選擇表你只需要做

table = soup.find('table', attrs={'class' : 'fe-form', 'cellpadding' : '0' })

這假設在你的文檔裏只有一個class = fe-form和cellpadding = 0的表。如果還有更多，這段代碼將只選擇第一個表。可以肯定你是不是在頁面上俯瞰東西，你可以做

tables = soup.findAll('table', attrs={'class' : 'fe-form', 'cellpadding' : '0' }) 
table = tables[0]

也許斷言，LEN（表）== 1，以確保只有一個表。

現在，要下載文件，你可以做很多事情。從您的代碼已裝入mechanize假設，你可以像

a_tags = table.findAll('a') 

for a in a_tags: 
    if '.zip' in a.get('href'): 
    br.retrieve(a.get('href'), a.text)

這將所有文件下載到您當前的工作目錄，並會根據自己的鏈接文本爲它們命名。

來源

2013-08-02 17:20:00 djas

@ user1087809：這個回答有用嗎？還是沒有工作？ – djas

我剛剛沒有在這個網頁上，謝謝你的幫助djas – user1087809

與蟒蛇下載zip文件機械化

回答

相關問題