0
我試圖使用python請求下載文件。我可以下載圖像,但對於PDF文件,內容是空的。無法使用python請求下載pdf文件
class Scraper():
def __init__(self, username=USERNAME, password=PASSWORD,
base_url=BASE_URL, login_url=LOGIN_URL, debug=False):
self.session = session()
self.authent()
if debug:
debug_http_request()
def get(self, url, *arg, **kw):
r = None
for i in range(REPLAY_LIMIT):
ld('getting %s (count %d)...' % (url, i))
r = self.session.get(url, headers=FF_USER_AGENT,
allow_redirects=False)
ld('response code %d ' % r.status_code)
if r.status_code in (200, 201):
return r
if r.status_code == 302 and r.url == BASE_URL:
li("redirected to >> " + r.url)
self.authent()
return r
def get_files_content(self, file_ids):
for f in set(file_ids):
url = ("very long url multiple lines string") % f
file_result = self.get(url, stream=True)
for block in file_result.iter_content(1024):
if not block:
break
print block
print "end of block"
當我試圖獲得由文件內容:
ser = Scraper(debug=True)
print ser.get_files_content([60857])
我碰到下面的調試結果:
reply: 'HTTP/1.1 200 OK\r\n'
header: Date: Fri, 27 Sep 2013 14:29:51 GMT
header: Server: Apache/2.2.16 (Debian)
header: X-Powered-By: PHP/5.3.18-1~dotdeb.0
header: Expires: Mon, 26 Jul 1997 05:00:00 GMT
header: Content-Transfer-Encoding: binary
header: Cache-control: private, must-revalidate
header: Pragma: no-cache
header: Content-Disposition: attachment; filename="the wanted file name";
header: Last-Modified: Fri, 27 Sep 2013 14:29:51 GMT
header: Content-Length: 0
header: Keep-Alive: timeout=15, max=96
header: Connection: Keep-Alive
header: Content-Type: application/pdf;
在響應沒有內容。以下代碼適用於其他文檔,如圖像。十分感謝。
'Content-Length:0'。你能否用瀏覽器發送請求,並確保你的編程生成的請求類似? – Blender
你聽說過[Scrapy](http://scrapy.org/)嗎?無論如何檢查網址:http://stackoverflow.com/questions/14669827/python-urlretrieve-pdf-downloading – lucasg
我看到這篇文章,但我認爲這是一種方式,爲服務,以防止刮... –