Python腳本來遍歷目錄中的PDF並找到匹配的行

當前，我通過電子郵件將所有報告發送給我，並以pdf的形式發送給我。我所做的就是設定Outlook每天自動將這些文件下載到某個目錄。有時候，這些PDF文件中沒有任何數據，只包含「沒有要與選擇條件匹配的數據」。我想創建一個python程序，遍歷該目錄中的每個pdf文件，打開它並查找這些單詞，如果它們包含該短語然後刪除該特定的pdf。如果他們不這麼做，通過幫助reddit我拼湊在一起的代碼如下：Python腳本來遍歷目錄中的PDF並找到匹配的行

import PyPDF2 
import os 

directory = 'C:\\Users\\jmoorehead\\Desktop\\A2IReports\\' 
for file in os.listdir(directory): 
    if not file.endswith(".pdf"): 
     continue 
    with open("{}/{}".format(directory,file), 'rb') as pdfFileObj: 
     pdfReader = PyPDF2.PdfFileReader(pdfFileObj) 
     pageObj = pdfReader.getPage(0) 
     if "There is no data to present that matches the selection criteria" in pageObj.extractText(): 
      print("{} was removed.".format(file)) 
      os.remove(file)

我測試了3個文件之一包含匹配的短語。不管文件的命名方式如何，它會以什麼順序失敗。我已經用名爲3.pdf的目錄中的一個文件對它進行了測試。下面是錯誤代碼得到。

FileNotFoundError: [WinError 2] The system cannot find the file specified: >'3.pdf'

這將大大減少我的工作量，是一個很好的學習例子，我的新手。所有幫助/批評歡迎。

來源

2017-06-14 user3487244

你有一個斜槓，而不是反斜槓：'{}/{}' – jsmiao

文件路徑操作使用字符串替換通常會導致這樣的錯別字。嘗試使用'os.path.join（路徑，*路徑）'，這裏記錄：https://docs.python.org/2/library/os.path.html – jsmiao

這裏是我的新代碼 - > [link] https ：//repl.it/Ilkx/0它給出了一個新的錯誤信息，可能是進步。錯誤是'TypeError：expected str，bytes或os.PathLike object，not module'。我確定的是因爲我不知道我在做什麼。 – user3487244

見下文：

import PyPDF2 
import os 

directory = 'C:\\Users\\jmoorehead\\Desktop\\A2IReports\\' 
for file in os.listdir(directory): 
    if not file.endswith(".pdf"): 
     continue 
    with open(os.path.join(directory,file), 'rb') as pdfFileObj: # Changes here 
     pdfReader = PyPDF2.PdfFileReader(pdfFileObj) 
     pageObj = pdfReader.getPage(0) 
     if "There is no data to present that matches the selection criteria" in pageObj.extractText(): 
      print("{} was removed.".format(file)) 
      os.remove(file)

來源

2017-06-14 20:04:53 jsmiao

產生錯誤「FileNotFoundError：[WinError 2]系統找不到指定的文件：'3.pdf'」 – user3487244

看起來您需要爲'os.remove（file）'指定完整的文件路徑。嘗試'os.remove（os.path.join（directory，file））'看看它是否工作。 – jsmiao

越來越近！「 PermissionError：[WinError 32]進程無法訪問文件，因爲它正在被另一個進程使用：'C：\\ Users \\ jmoorehead \\ Desktop \\ A2IReports \\ 3.pdf'」 – user3487244

Python腳本來遍歷目錄中的PDF並找到匹配的行

回答

相關問題