0
我想從一個有登錄名的站點下載20左右的pdf文件。這是我迄今爲止的,但它沒有下載任何有效的PDF文件(即它們都已損壞)。我也是python的新手。Python - 下載pdf文件(非.pdf)url
import mechanize
import urllib2
def download_file(download_url):
response = urllib2.urlopen(download_url)
print response.geturl()
print response.read()
file = open("document.pdf", 'wb')
file.write(response.read())
file.close()
brwser = mechanize.Browser()
brwser.addheaders = [('User-agent', 'Firefox')]
response = brwser.open(url)
brwser.select_form(nr = 0)
brwser.form['UserName'] = 'username'
brwser.form['Password'] = 'password'
nextpage = brwser.submit()
# Navigate to the page I want
for link in brwser.links():
if link.text == 'Some pdf':
request = brwser.follow_link(link)
download_file(link.url)
我不知道該怎麼嘗試。對於PDF文件的URL都是這樣
https://example.com/something/source2.aspx?id=e9a9bfdc-7d97-e411-9e03-76439cf4d30e
另外,response.read()如下:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head><title>
Source
</title>
<script type='text/javascript'>
window.onload = function() {
var url = window.location.href.replace('source.aspx?', 'source2.aspx?');
window.location = url;
};
</script>
</head>
<body>
<div style='position:fixed; height:100%; width:100%; overflow:hidden; top:100px; left:100px;'>Loading, please wait.</div>
</body>
</html>
那麼,如何下載這些文件?
我會盡快給你一個嘗試。謝謝 – roger168168