如何使用iTextSharp從粘滯便箋中獲取文本？

我試圖把所有使用iTextSharp的PDF文件中的文本。目前我只能得到頁面上的實際文本，不包含在用戶評論或「即時貼」如Adobe稱他們的文字。有沒有辦法做到這一點？這裏是我的代碼，到目前爲止，但我只是得到空字符串：如何使用iTextSharp從粘滯便箋中獲取文本？

PdfReader pdfRead = new PdfReader(pdfFilePath); 
    AcroFields form = pdfRead.AcroFields;    

    string txt = ""; 
    for (int page = 1; page <= pdfRead.NumberOfPages; ++page) 
    { 
      PdfDictionary pagedic = pdfRead.GetPageN(page); 
      PdfArray annotarray = (PdfArray)PdfReader.GetPdfObject(pagedic.Get(PdfName.ANNOTS)); 

      if (annotarray == null || annotarray.Size == 0) 
       continue; 

      foreach (PdfObject A in annotarray.ArrayList) 
      { 
       PdfDictionary AnnotationDictionary = (PdfDictionary)PdfReader.GetPdfObject(A); 

       txt += AnnotationDictionary.GetAsString(PdfName.NOTE); 
       txt += "\n"; 
      } 
    }

來源

2013-07-01 gaynorvader

if (AnnotationDictionary.Get(PdfName.SUBTYPE).Equals(PdfName.TEXT)) 
{ 
    string Title = AnnotationDictionary.GetAsString(PdfName.T).ToString(); 
    string Content = AnnotationDictionary.GetAsString(PdfName.CONTENTS).ToString(); 
}

來源

2013-09-16 13:10:30

我不知道C＃做的，但你可以找到反部分here（在本例中的文件是pages.pdf）。這個例子的輸出是：

Annotation 1 
/Contents: This is a post-it annotation 
/Subtype: /Text 
/Rect: [36, 768, 56, 788] 
/T: Example 
Annotation 2 
/C: [0, 0, 1] 
/Border: [0, 0, 0] 
/A: Dictionary 
/Subtype: /Link 
/Rect: [66.67, 785.52, 98, 796.62]

的第一個註釋是即時貼註釋（在ISO-32000-1的話，一個文本註釋），關鍵你要找的ISN 「T PdfName.NOTE，但PdfName.T爲內容的標題和PdfName.CONTENTS。

來源

2013-07-01 17:12:59

如何使用iTextSharp從粘滯便箋中獲取文本？

回答

相關問題