2017-05-20 79 views
0

如何替換使用Python請求下載的pdf文件的名稱?使用Python請求重命名下載的文件

我想將它保存爲Manual_name1.pdf不是Elkinson%20Jeffrey.pdf

CSV文件看起來像:

Manual_name1 https://www.adndrc.org/diymodule/doc_panellist/Elkinson%20Jeffrey.pdf 
Manual_name2 http://www.parliament.bm/uploadedFiles/Content/House_Business/Presentation_of_Papers_and_of_Reports/PCA%20Report%209262014.pdf 
manual_name3 http://www.ohchr.org/Documents/HRBodies/OPCAT/elections2016/HaimoudRamdan.pdf 

我當前的代碼:

import os 
import csv 
import requests 

write_path = 'C:\\Users\\hgdht\\Desktop\\Downloader_Automation' # ASSUMING THAT FOLDER EXISTS! 

with open('Links.csv', 'r') as csvfile: 
    spamreader = csv.reader(csvfile) 
    for link in spamreader: 
     if not link: 
      continue 
     print('-'*72) 
     pdf_file = link[0].split('/')[-1] 
     with open(os.path.join(write_path, pdf_file), 'wb') as pdf: 
      try: 
       # Try to request PDF from URL 
       print('TRYING {}...'.format(link[0])) 
       a = requests.get(link[0], stream=True) 
       for block in a.iter_content(512): 
        if not block: 
         break 

        pdf.write(block) 
       print('OK.') 
      except requests.exceptions.RequestException as e: # This 
will catch ONLY Requests exceptions 
       print('REQUESTS ERROR:') 
       print(e) # This should tell you more details about the error 

回答

1

而不是

pdf_file = link[0].split('/')[-1] 

使用特定的列從CSV文件:

pdf_file = link[1] # (assuming the file name is in the second column) 

如果文件名是第一列,你應該使用

pdf_file = link[0] # (assuming the file name is in the first column) 
# OR 
import time # put this in the beginning of your script 
pdf_file = '{}-{}.pdf'.format(link[0], int(time.time())) 
# file name will look like: "name-1495460691.pdf" 

但隨後你將不得不改變參考鏈接本身當與請求呼叫時:

a = requests.get(link[1], stream=True) # (assuming the link is in the second column) 
+0

其工作。但是,它的保存沒有任何'文件類型',如果我在第1列中有2或3個相同的名稱,它會一次又一次地替換文件+我如何將'timestamp'放在文件名中,以便它不替換具有相同名稱的文件。 @errata – WarLock

+0

@WarLock當然,它會替換同名的文件:)你必須確保所有名稱都是唯一的。這是每個操作系統的「功能」......我更新了我的答案,爲每個文件名添加了時間戳。 – errata

+0

如果我們在B列中有多個鏈接,C,D也在同一'manual_name'前面,並用保存名稱保存。我們怎樣才能讀取這個鏈接。 @errata – WarLock