使用PDFBox 2.0.2從PDF中提取文本缺失類PDFTextStripper（）

我已經在java中使用PDFBox 1.8.10實現了簡單的文本提取方法。由於一些原因，我必須將庫升級到PDFBox 2.0.2。可能PDFTextStripper（）方法被刪除或位於新版本中的另一個包。有什麼辦法解決這個問題嗎？或者你能否提出另一種從PDF獲取文本的方法？使用PDFBox 2.0.2從PDF中提取文本缺失類PDFTextStripper（）

這裏是我的代碼：

public String extractTextFromPdf() { 
    File jInputFile = new File("c:/lorem/ipsum.pdf"); 
    PDDocument PDDoc = PDDocument.load(jInputFile); 
    String strContent = new PDFTextStripper().getText(PDDoc); 
    PDDoc.close(); 
    return strContent; 
}

在此先感謝。

來源

2016-08-01 brootforce

什麼IDE你用這個嗎？在Netbeans中，按下Ctrl-Shift-I，導入將自動修復。在eclipse中，按下Ctrl-Shift-O。 –

@TilmanHausherr謝謝你。我正在使用eclipse。重新啓動後，它已被修復。我認爲這是一個暫時的錯誤。 PDFBox將PDFTextStripper類從'org.apache.pdfbox.util'移動到'org.apache.pdfbox.text'包。什麼是發展... – brootforce

很高興它的工作原理。請刪除你的問題，因爲這是相當微不足道的。或者自己回答。 –

嘗試

{ 
    PDDocument document = null; 
    document = PDDocument.load(new File("test.pdf")); 
    document.getClass(); 
    if (!document.isEncrypted()) { 
     PDFTextStripperByArea stripper = new PDFTextStripperByArea(); 
     stripper.setSortByPosition(true); 
     PDFTextStripper Tstripper = new PDFTextStripper(); 
     String st = Tstripper.getText(document); 
     System.out.println("Text:" + st); 
    } 
} catch (Exception e) { 
    e.printStackTrace(); 
}`

來源

2016-08-01 09:35:01 SerefAltindal

這不是問題的答案。另外，'document.getClass（）;'沒有效果。 '如果（！document.isEncrypted（））'不需要。 –

使用PDFBox 2.0.2從PDF中提取文本缺失類PDFTextStripper（）

回答

相關問題