2013-02-19 107 views
2

我遇到以下問題。我有一個PDF文件,裏面附帶一個XML文件作爲註釋。 不作爲嵌入文件,但作爲註釋。現在,我嘗試從以下鏈接的代碼來閱讀:使用iTextSharp閱讀PDF文件附件註釋

iTextSharp - how to open/read/extract a file attachment?

它適用於嵌入式文件,但不能是文件attachemts作爲註解。

我谷歌從PDF提取註釋,並找出以下鏈接: Reading PDF Annotations with iText

所以註釋類型爲「文件附件集註」

有人能顯示一個工作的例子嗎?

預先感謝任何幫助

回答

8

正如經常在有關的iText和iTextSharp的問題,先要看看keyword list on itextpdf.com。這裏您可以找到File attachment, extract attachments引用來自iText in Action — 2nd Edition的兩個Java樣本:

的類似Webified iTextSharp Examples

KubrickDvds包含以下方法extractAttachments/ExtractAttachments提取文件附件註解:

爪哇:

/** 
* Extracts attachments from an existing PDF. 
* @param src the path to the existing PDF 
*/ 
public void extractAttachments(String src) throws IOException { 
    PdfReader reader = new PdfReader(src); 
    PdfArray array; 
    PdfDictionary annot; 
    PdfDictionary fs; 
    PdfDictionary refs; 
    for (int i = 1; i <= reader.getNumberOfPages(); i++) { 
     array = reader.getPageN(i).getAsArray(PdfName.ANNOTS); 
     if (array == null) continue; 
     for (int j = 0; j < array.size(); j++) { 
      annot = array.getAsDict(j); 
      if (PdfName.FILEATTACHMENT.equals(annot.getAsName(PdfName.SUBTYPE))) { 
       fs = annot.getAsDict(PdfName.FS); 
       refs = fs.getAsDict(PdfName.EF); 
       for (PdfName name : refs.getKeys()) { 
        FileOutputStream fos 
         = new FileOutputStream(String.format(PATH, fs.getAsString(name).toString())); 
        fos.write(PdfReader.getStreamBytes((PRStream)refs.getAsStream(name))); 
        fos.flush(); 
        fos.close(); 
       } 
      } 
     } 
    } 
    reader.close(); 
} 

C#:

/** 
* Extracts attachments from an existing PDF. 
* @param src the path to the existing PDF 
* @param zip the ZipFile object to add the extracted images 
*/ 
public void ExtractAttachments(byte[] src, ZipFile zip) { 
    PdfReader reader = new PdfReader(src); 
    for (int i = 1; i <= reader.NumberOfPages; i++) { 
    PdfArray array = reader.GetPageN(i).GetAsArray(PdfName.ANNOTS); 
    if (array == null) continue; 
    for (int j = 0; j < array.Size; j++) { 
     PdfDictionary annot = array.GetAsDict(j); 
     if (PdfName.FILEATTACHMENT.Equals(
      annot.GetAsName(PdfName.SUBTYPE))) 
     { 
     PdfDictionary fs = annot.GetAsDict(PdfName.FS); 
     PdfDictionary refs = fs.GetAsDict(PdfName.EF); 
     foreach (PdfName name in refs.Keys) { 
      zip.AddEntry(
      fs.GetAsString(name).ToString(), 
      PdfReader.GetStreamBytes((PRStream)refs.GetAsStream(name)) 
     ); 
     } 
     } 
    } 
    } 
} 

KubrickDocumentary包含以下方法extractDocLevelAttachments/ExtractDocLevelAttachments提取文檔級別的附件:

爪哇:

/** 
* Extracts document level attachments 
* @param filename  a file from which document level attachments will be extracted 
* @throws IOException 
*/ 
public void extractDocLevelAttachments(String filename) throws IOException { 
    PdfReader reader = new PdfReader(filename); 
    PdfDictionary root = reader.getCatalog(); 
    PdfDictionary documentnames = root.getAsDict(PdfName.NAMES); 
    PdfDictionary embeddedfiles = documentnames.getAsDict(PdfName.EMBEDDEDFILES); 
    PdfArray filespecs = embeddedfiles.getAsArray(PdfName.NAMES); 
    PdfDictionary filespec; 
    PdfDictionary refs; 
    FileOutputStream fos; 
    PRStream stream; 
    for (int i = 0; i < filespecs.size();) { 
     filespecs.getAsString(i++); 
     filespec = filespecs.getAsDict(i++); 
     refs = filespec.getAsDict(PdfName.EF); 
     for (PdfName key : refs.getKeys()) { 
     fos = new FileOutputStream(String.format(PATH, filespec.getAsString(key).toString())); 
     stream = (PRStream) PdfReader.getPdfObject(refs.getAsIndirectObject(key)); 
     fos.write(PdfReader.getStreamBytes(stream)); 
     fos.flush(); 
     fos.close(); 
     } 
    } 
    reader.close(); 
} 

C#:

/** 
* Extracts document level attachments 
* @param PDF from which document level attachments will be extracted 
* @param zip the ZipFile object to add the extracted images 
*/ 
public void ExtractDocLevelAttachments(byte[] pdf, ZipFile zip) { 
    PdfReader reader = new PdfReader(pdf); 
    PdfDictionary root = reader.Catalog; 
    PdfDictionary documentnames = root.GetAsDict(PdfName.NAMES); 
    PdfDictionary embeddedfiles = 
     documentnames.GetAsDict(PdfName.EMBEDDEDFILES); 
    PdfArray filespecs = embeddedfiles.GetAsArray(PdfName.NAMES); 
    for (int i = 0; i < filespecs.Size;) { 
    filespecs.GetAsString(i++); 
    PdfDictionary filespec = filespecs.GetAsDict(i++); 
    PdfDictionary refs = filespec.GetAsDict(PdfName.EF); 
    foreach (PdfName key in refs.Keys) { 
     PRStream stream = (PRStream) PdfReader.GetPdfObject(
     refs.GetAsIndirectObject(key) 
    ); 
     zip.AddEntry(
     filespec.GetAsString(key).ToString(), 
     PdfReader.GetStreamBytes(stream) 
    ); 
    } 
    } 
} 

(出於某種原因,C#示例把提取的文件在一些ZIP文件,而版本的Java把它們放到文件系統......哦也...)

+0

確定。謝謝。它完美的作品。 ExtractAttachments函數是我需要的。 – 2013-02-19 21:24:54