如何使用python從網頁下載.zip文件？

-1

這是我想從下載的.zip文件的網頁：https://www.google.com/googlebooks/uspto-patents-grants-text.html#2010 如何使用python從網頁下載.zip文件？

是否有任何Python代碼我可以編寫和/或使用beautifulSoup下載，比方說2006年可全部.zip文件？

2015-11-03 deepdeb

您可以使用urlretrieve像下面

import urllib 
urllib.urlretrieve ("http://storage.googleapis.com/patents/grant_full_text/2010/ipg100105.zip", "ipg100105.zip")

來源

2015-11-03 22:33:14

您也可以使用wget。

>>> import wget 
>>> url = 'http://www.example.com/mp3/mysong.mp3' 
>>> filename = wget.download(url) 
100% [................................................] 3841532/3841532 
>>> filename 
'mysong.mp3'

來源

2015-11-03 22:37:49 thermite

與beautifulSoup的問題可能是h3不是zip鏈接的父代。

你可以解析html（使用request.get(URL).text）並檢查h3與你想要的年份並保存所有內容到下一個h3（或文本結尾）。

然後你可以bs4，或只是正則表達式<a href="something">。

來源

2015-11-03 22:38:48

將「yearToGet」更改爲從指定年份下載文件。

from bs4 import BeautifulSoup 
from urllib2 import * 

yearToGet = '2006'

sourcePage =的urlopen（請求（ 'https://www.google.com/googlebooks/patents-grants-text.html'））湯= BeautifulSoup（sourcePage.read（））

links = soup.find_all('a') 

for link in links: 
    href = link['href'] 

    if yearToGet in href and '.zip' in href: 

     remoteZip = urlopen(Request(href)) 
     file_name = href.rpartition('/')[-1] 
     local_file = open(file_name, 'wb') 
     local_file.write(remoteZip.read()) 
     local_file.close()

來源

2015-11-03 23:18:18

，如果你想從不同的年份下載的文件修改代碼。如果你想更優雅地下載文件，我相信你可以弄明白，歡呼！

from urllib.request import urlopen 
from bs4 import BeautifulSoup 
import re, webbrowser 


html = urlopen("https://www.google.com/googlebooks/uspto-patents-grants-text.html#2010") 
soup = BeautifulSoup(html.read(), "html.parser") 

#linkList = soup.findAll("a") 
linkList = [x.text for x in soup.findAll("a", text=re.compile(""))] 

list_2006 = [] 
for item in linkList: 
    if 'ipg06' in item: 
     item = item.strip('\n') 
     #open the url with the item name appended at the end 
     #this will consequently download the files for you! 
     webbrowser.open("http://storage.googleapis.com/patents/grant_full_text/2006/"+item)

來源

2015-11-03 23:41:30 deedle

如何使用python從網頁下載.zip文件？

回答

相關問題