2013-09-27 36 views
0

我試圖使用python請求下載文件。我可以下載圖像,但對於PDF文件,內容是空的。無法使用python請求下載pdf文件

class Scraper(): 
    def __init__(self, username=USERNAME, password=PASSWORD, 
       base_url=BASE_URL, login_url=LOGIN_URL, debug=False): 
    self.session = session() 
    self.authent() 
    if debug: 
     debug_http_request() 

    def get(self, url, *arg, **kw): 
    r = None 
    for i in range(REPLAY_LIMIT): 
     ld('getting %s (count %d)...' % (url, i)) 
     r = self.session.get(url, headers=FF_USER_AGENT, 
          allow_redirects=False) 
     ld('response code %d ' % r.status_code) 
     if r.status_code in (200, 201): 
     return r 
     if r.status_code == 302 and r.url == BASE_URL: 
     li("redirected to >> " + r.url) 
     self.authent() 
    return r 

    def get_files_content(self, file_ids): 
    for f in set(file_ids): 
     url = ("very long url multiple lines string") % f 
     file_result = self.get(url, stream=True) 
     for block in file_result.iter_content(1024): 
     if not block: 
      break 
     print block 
     print "end of block" 

當我試圖獲得由文件內容:

ser = Scraper(debug=True)   
print ser.get_files_content([60857]) 

我碰到下面的調試結果:

reply: 'HTTP/1.1 200 OK\r\n' 
header: Date: Fri, 27 Sep 2013 14:29:51 GMT 
header: Server: Apache/2.2.16 (Debian) 
header: X-Powered-By: PHP/5.3.18-1~dotdeb.0 
header: Expires: Mon, 26 Jul 1997 05:00:00 GMT 
header: Content-Transfer-Encoding: binary 
header: Cache-control: private, must-revalidate 
header: Pragma: no-cache 
header: Content-Disposition: attachment; filename="the wanted file name"; 
header: Last-Modified: Fri, 27 Sep 2013 14:29:51 GMT 
header: Content-Length: 0 
header: Keep-Alive: timeout=15, max=96 
header: Connection: Keep-Alive 
header: Content-Type: application/pdf; 

在響應沒有內容。以下代碼適用於其他文檔,如圖像。十分感謝。

+0

'Content-Length:0'。你能否用瀏覽器發送請求,並確保你的編程生成的請求類似? – Blender

+0

你聽說過[Scrapy](http://scrapy.org/)嗎?無論如何檢查網址:http://stackoverflow.com/questions/14669827/python-urlretrieve-pdf-downloading – lucasg

+0

我看到這篇文章,但我認爲這是一種方式,爲服務,以防止刮... –

回答

3

您的服務器也認爲,PDF文件的長度爲零:

Content-Length: 0 

請調試您的服務器上的問題。也許上傳出錯了?