2013-09-27 156 views
0

我嘗試從此頁面運行Pdfbox示例:http://www.printmyfolders.com/Home/PDFBox-Tutorial 從PDF文件中提取文本。當我嘗試運行它時,出現錯誤:嘗試運行pdfbox程序時出錯

org.apache.pdfbox.exceptions.WrappedIOException 
    at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:245) 
    at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1192) 
    at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1159) 
    at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1130) 
    at GetPos.main(GetPos.java:14) 
Caused by: java.lang.ArrayIndexOutOfBoundsException 
    at java.lang.System.arraycopy(libgcj.so.10) 
    at java.io.ByteArrayOutputStream.write(libgcj.so.10) 
    at org.apache.pdfbox.filter.FlateFilter.decompress(FlateFilter.java:172) 
    at org.apache.pdfbox.filter.FlateFilter.decode(FlateFilter.java:98) 
    at org.apache.pdfbox.cos.COSStream.doDecode(COSStream.java:295) 
    at org.apache.pdfbox.cos.COSStream.doDecode(COSStream.java:237) 
    at org.apache.pdfbox.cos.COSStream.getUnfilteredStream(COSStream.java:172) 
    at org.apache.pdfbox.pdfparser.PDFXrefStreamParser.<init>(PDFXrefStreamParser.java:61) 
    at org.apache.pdfbox.pdfparser.PDFParser.parseXrefStream(PDFParser.java:848) 
    at org.apache.pdfbox.pdfparser.PDFParser.parseObject(PDFParser.java:576) 
    at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:188) 
    ...4 more 

這是什麼意思? 空白pdf的第一個例子很好。

回答

0

使用示例來生成文本PDF,然後閱讀本教程的文本問題

package com.mycompany.mavenproject; 

import java.io.File; 
import junit.framework.Test; 
import junit.framework.TestCase; 
import junit.framework.TestSuite; 
import org.apache.pdfbox.pdmodel.PDDocument; 
import org.apache.pdfbox.pdmodel.PDPage; 
import org.apache.pdfbox.pdmodel.edit.PDPageContentStream; 
import org.apache.pdfbox.pdmodel.font.PDFont; 
import org.apache.pdfbox.pdmodel.font.PDType1Font; 
import org.apache.pdfbox.util.PDFTextStripper; 

/** 
* Unit test for simple App. 
*/ 
public class AppTest 
    extends TestCase { 

public static Test suite() { 
    return new TestSuite(AppTest.class); 
} 

public void test() throws Exception { 
    final String fileName = "PDFWithText.pdf"; 
    writeDocument(fileName); 
    PDDocument pd = PDDocument.load(new File(fileName)); 
    PDFTextStripper stripper = new PDFTextStripper(); 
    String text = stripper.getText(pd); 
    assertEquals("Hello from www.printmyfolders.com", text.trim()); 
} 

private void writeDocument(String fileName) throws Exception { 
    PDDocument doc = new PDDocument(); 
    PDPage page = new PDPage(); 

    doc.addPage(page); 
    PDFont font = PDType1Font.HELVETICA_BOLD; 

    PDPageContentStream content = new PDPageContentStream(doc, page); 
    content.beginText(); 
    content.setFont(font, 12); 
    content.moveTextPositionByAmount(100, 700); 
    content.drawString("Hello from www.printmyfolders.com"); 

    content.endText(); 
    content.close(); 
    doc.save(fileName); 
    doc.close(); 
} 
} 

作品無一例外。考慮到來自加載方法的異常冒泡,請確保PDF格式不正常。

+0

很抱歉,但它不工作。我不是Java開發人員,也許我錯過了什麼?你能給我你的* .java文件的完整代碼嗎? – Footniko

+0

嗯..我在一個空的Maven模塊(NetBeans)中進行單元測試。唯一缺少的代碼是類定義和構造函數。修改原始帖子以包含完整的.java文件。 – Origineil

0

使用temp目錄:

parser.setTempDirectory(new File(directoryPath)); 

例如:

File in = new File("somefile.pdf"); 
InputStream fin = new FileInputStream(in); 
PDFParser parser = new PDFParser(fin); 
parser.setTempDirectory(new File(tempDirectoryPath)); 
parser.parse(); 
PDDocument document = parser.getPDDocument();