如何讀取pdf文件並將其寫入outputStream

我需要讀取帶有filepath「C：\ file.pdf」的pdf文件並將其寫入outputStream。什麼是最簡單的方法來做到這一點？如何讀取pdf文件並將其寫入outputStream

@Controller 
public class ExportTlocrt { 

@Autowired 
private PhoneBookService phoneBookSer; 

private void setResponseHeaderTlocrtPDF(HttpServletResponse response) { 
    response.setContentType("application/pdf"); 
    response.setHeader("content-disposition", "attachment; filename=Tlocrt.pdf"); 
} 

@RequestMapping(value = "/exportTlocrt.html", method = RequestMethod.POST) 
public void exportTlocrt(Model model, HttpServletResponse response, HttpServletRequest request){ 

    setResponseHeaderTlocrtPDF(response); 
    File f = new File("C:\\Tlocrt.pdf"); 

    try { 
     OutputStream os = response.getOutputStream(); 
     byte[] buf = new byte[8192]; 
     InputStream is = new FileInputStream(f); 
     int c = 0; 
     while ((c = is.read(buf, 0, buf.length)) > 0) { 
      os.write(buf, 0, c); 
      os.flush(); 
     } 
     os.close(); 
     is.close(); 
    } catch (IOException e) { 
     e.printStackTrace(); 
    } 

} 
}

............................................ ................................................

來源

2013-06-03 Juraj Vlahović

你的問題似乎要求從文件複製例程到一個專門的'OutputStream'和@Pheonix'答案顯示如何做到這一點---是否有任何理由你標記你的問題[pdf]更不用說[itext] ？ – mkl

我在我的項目中使用了Itext，所以我認爲它在這個例子中可能是有用的。我錯了。 –

事實上，就像@ Stephan的答案提出了一個使用PDFBox的解決方案，您也可以使用iText首先解析整個PDF，然後再次序列化它。但用PDF庫（PDFBox或iText）複製PDF文件會浪費大量資源，並可能會改變相關PDF文件。 – mkl

import java.io.*; 


public class FileRead { 


    public static void main(String[] args) throws IOException { 


     File f=new File("C:\\Documents and Settings\\abc\\Desktop\\abc.pdf"); 

     OutputStream oos = new FileOutputStream("test.pdf"); 

     byte[] buf = new byte[8192]; 

     InputStream is = new FileInputStream(f); 

     int c = 0; 

     while ((c = is.read(buf, 0, buf.length)) > 0) { 
      oos.write(buf, 0, c); 
      oos.flush(); 
     } 

     oos.close(); 
     System.out.println("stop"); 
     is.close(); 

    } 

}

The到目前爲止最簡單的方法。希望這可以幫助。

來源

2013-06-03 07:32:50 ankurtr

Thx尋求幫助。這正是我需要的。 –

您的代碼中可能缺少某些東西，或者我錯過了某些東西？我得到的文件有0個字節，我無法打開它。我將用代碼編輯我的問題。 –

@JurajVlahović：完美的作品。 – ankurtr

您可以使用Apache的PdfBox，它易於使用且性能良好。

下面是一個PDF文件中提取文本的例子（你可以閱讀更多here）：

import java.io.*; 
import org.apache.pdfbox.pdmodel.*; 
import org.apache.pdfbox.util.*; 

public class PDFTest { 

public static void main(String[] args){ 
PDDocument pd; 
BufferedWriter wr; 
try { 
     File input = new File("C:\\Invoice.pdf"); // The PDF file from where you would like to extract 
     File output = new File("C:\\SampleText.txt"); // The text file where you are going to store the extracted data 
     pd = PDDocument.load(input); 
     System.out.println(pd.getNumberOfPages()); 
     System.out.println(pd.isEncrypted()); 
     pd.save("CopyOfInvoice.pdf"); // Creates a copy called "CopyOfInvoice.pdf" 
     PDFTextStripper stripper = new PDFTextStripper(); 
     wr = new BufferedWriter(new OutputStreamWriter(new FileOutputStream(output))); 
     stripper.writeText(pd, wr); 
     if (pd != null) { 
      pd.close(); 
     } 
     // I use close() to flush the stream. 
     wr.close(); 
} catch (Exception e){ 
     e.printStackTrace(); 
     } 
    } 
}

UPDATE：

可以使用PDFTextStripper獲取文本：

PDFTextStripper reader = new PDFTextStripper(); 
String pageText = reader.getText(pd); // PDDocument object created

來源

2013-06-03 07:26:02 Stephan

Pdf包含帶有一些小文本的圖片。我不需要將它寫入txt或其他文件，只需將其寫入OutputStream即可。 –

這只是一個例子，你可以很容易地修改它 – Stephan

看我更新的答案 – Stephan

如何讀取pdf文件並將其寫入outputStream

回答

相關問題