用Python和pyPDF提取PDF的前兩行

我使用python 2.7和pyPDF從PDF文件中獲取標題元信息。不幸的是，並非所有的PDF都有元信息。我現在想要做的是從PDF中獲取前兩行文本。使用我現在擁有的如何修改代碼來捕獲pyPDF的前兩行？用Python和pyPDF提取PDF的前兩行

from pyPdf import PdfFileWriter, PdfFileReader 
import os 

for fileName in os.listdir('.'): 
    try: 
     if fileName.lower()[-3:] != "pdf": continue 
     input1 = PdfFileReader(file(fileName, "rb")) 

     # print the title of document1.pdf 
     print fileName, input1.getDocumentInfo().title 
    except: 
     print ",",

來源

2016-09-29 acctman

from PyPDF2 import PdfFileWriter, PdfFileReader 
import os 
import StringIO 

fileName = "HMM.pdf" 
try: 
     if fileName.lower()[-3:] == "pdf": 
      input1 = PdfFileReader(file(fileName, "rb")) 

      # print the title of document1.pdf 
      #print fileName, input1.getDocumentInfo().title 

      content = input1.getPage(0).extractText() 
      buf = StringIO.StringIO(content) 
      buf.readline() 
      buf.readline() 

except: 
     print ",",

我PWD包含此「HMM.pdf」文件，該代碼正在Python 2.7版正常。

來源

2016-09-29 04:53:50

你能告訴我一個基於當前代碼的完整示例......當我添加上面給出的代碼時，它不會輸出任何內容。編輯代碼 – acctman

。 –

嗯，我錯過了什麼仍然沒有輸出。在一個文件夾中有5個pdf文件，我運行腳本並沒有任何記錄 – acctman

用Python和pyPDF提取PDF的前兩行

回答

相關問題