使用java閱讀pdf文件

任何人都可以說如何從使用java的pdf文件中提取所有單詞（逐字）。

以下代碼從pdf文件中提取內容並將其寫入另一個pdf文件。我希望程序將它寫入文本文件中。

import java.io.FileOutputStream; 

import java.io.IOException; 

import com.itextpdf.text.*; 

import com.itextpdf.text.pdf.*; 

public class pdf { 

    private static String INPUTFILE = "http://www.britishcouncil.org/learning-infosheets-medicine.pdf" ; 

    private static String OUTPUTFILE = "c:/new3.pdf"; 

    public static void main(String[] args) throws DocumentException, 
      IOException { 

     Document document = new Document(); 

     PdfWriter writer = PdfWriter.getInstance(document, 
       new FileOutputStream(OUTPUTFILE)); 

     document.open(); 

     PdfReader reader = new PdfReader(INPUTFILE); 

     int n = reader.getNumberOfPages(); 

     PdfImportedPage page; 


     for (int i = 1; i <= n; i++) { 

       page = writer.getImportedPage(reader, i); 

       Image instance = Image.getInstance(page); 

       document.add(instance); 

     } 

     document.close(); 

    } 

}

在此先感謝

來源

2010-10-25 Rim

可能重複[如何閱讀PDF文件使用java]（http://stackoverflow.com/questions/4784825/how-to-read-pdf-files-using-java） – Travis 2015-03-12 13:36:38

看看這個：

How to Read PDF File in Java（使用Apache PDF盒庫）

來源

2010-10-25 14:26:05

PDFBox很棒。 – 2010-10-25 14:40:36

使用org.apache.pdfbox

import org.apache.pdfbox.*; 

public static String convertPDFToTxt(String filePath) { 
     byte[] thePDFFileBytes = readFileAsBytes(filePath); 
     PDDocument pddDoc = PDDocument.load(thePDFFileBytes); 
     PDFTextStripper reader = new PDFTextStripper(); 
     String pageText = reader.getText(pddDoc); 
     pddDoc.close(); 
     return pageText; 
} 

private static byte[] readFileAsBytes(String filePath) { 
     FileInputStream inputStream = new FileInputStream(filePath); 
     return IOUtils.toByteArray(inputStream); 
}

來源

2017-02-15 20:37:25 dina

我可以部分閱讀pdf文件嗎？例如，只有第一頁，或直到某個文本發生，而不是閱讀整個PDF文件？所以我可以避免下載整個文件。 – vasilevich 2017-09-27 09:29:45

使用java閱讀pdf文件

回答

相關問題