2011-07-05 191 views
3

我正在使用iTextSharp庫和C#.Net來分割我的PDF文件。使用ITextSharp編輯PDF中的超鏈接和錨點

考慮一個名爲sample.pdf的PDF文件,其中包含72頁。此sample.pdf包含具有導航到其他頁面的超鏈接的頁面。例如:在第4頁有三個超鏈接,當點擊導航到對應的第24,27,28頁時。與第4頁相同,有近12頁與他們有超鏈接。

現在使用iTextSharp庫我已經將這個PDF頁面拆分爲72個單獨的文件,並保存爲名稱爲1.pdf,2.pdf .... 72.pdf。所以在4.pdf中單擊該超鏈接時,我需要將PDF導航到24.pdf,27.pdf,28.pdf。

請幫我看看這裏。如何編輯和設置4.pdf中的超鏈接,以便導航到相應的pdf文件。

謝謝 阿肖克

回答

6

你想要的是完全可能的。你想要什麼將需要你使用低級的PDF對象(PdfDictionary,PdfArray等)。

每當有人需要使用這些對象時,我總是將它們引用到PDF Reference。就你而言,你需要檢查第7章(特別是第3節)和第12章第3節(文檔級導航)和第5章(註釋)。

假設你讀過,這裏就是你需要做什麼:

  1. 通過每個頁面的註釋排列工序(原文檔,纔打破它)。
    1. 查找所有鏈接註釋&其目的地。
    2. 爲與新文件對應的該鏈接構建新的目標。
    3. 將新目標寫入鏈接註釋。
  2. 使用PdfCopy(將複製註釋以及頁面內容)將此頁寫入新的PDF。

步驟1.1並不簡單。有幾種不同的「本地goto」註釋格式。您需要確定給定鏈接指向哪個頁面。有些鏈接可能會說PDF等同於「下一頁」或「上一頁」,而另一些鏈接則會包含對特定頁面的引用。這將是一個「間接對象引用」,而不是一個頁碼。

要確定頁面引用中的頁碼,您需要... ouch。好的。最有效的方法是爲原始文檔中的每個頁面調用PdfReader.GetPageRef(int pageNum),並將其緩存在地圖中(reference-> pageNum)。您可以通過創建一個遠程轉到PdfAction並將其寫入鏈接註釋的「A」(動作)條目,刪除之前存在的任何東西(可能是「Dest」)來構建「遠程轉到」鏈接。

我不會說C#很好,所以我會把實際的實現留給你。

+0

嗨,馬克感謝您的幫助。我正在分析這個文件。你能否給我提供一個示例代碼。因爲我需要儘快完成並交付。 –

+0

@MarkStorer - 步驟1.1是我迷失的地方,由於iTextSharp的對象和糟糕的文檔,PDF參考並沒有什麼幫助。我可以找到所有'subtype =/link'註解,但從那裏有不同的類型,它們的鍵/元素是不同的。我在這裏編輯和擴展了這個問題:http://stackoverflow.com/questions/5579051/read-internal-link-annotation-using-itextsharp –

3

好的,基於@Mark Storer這裏有一些入門代碼。第一種方法創建一個包含10頁的樣本PDF,並在第一頁上創建一些鏈接跳轉到PDF的不同部分,以便我們可以使用某些內容。第二種方法打開在第一種方法中創建的PDF並遍歷每個註釋,試圖找出註釋鏈接到哪個頁面並將其輸出到TRACE窗口。該代碼在VB中,但如果需要,應該很容易轉換爲C#。它的目標是iTextSharp 5.1.1.0。

如果我有機會,我可能會嘗試進一步採取這種做法,實際分割和重新鏈接的東西,但我現在沒有時間。

Option Explicit On 
Option Strict On 

Imports iTextSharp.text 
Imports iTextSharp.text.pdf 
Imports System.IO 

Public Class Form1 
    ''//Folder that we are working in 
    Private Shared ReadOnly WorkingFolder As String = Path.Combine(Environment.GetFolderPath(Environment.SpecialFolder.Desktop), "Hyperlinked PDFs") 
    ''//Sample PDF 
    Private Shared ReadOnly BaseFile As String = Path.Combine(WorkingFolder, "Sample.pdf") 

    Private Shared Sub CreateSamplePdf() 
     ''//Create our output directory if it does not exist 
     Directory.CreateDirectory(WorkingFolder) 

     ''//Create our sample PDF 
     Using Doc As New iTextSharp.text.Document(PageSize.LETTER) 
      Using FS As New FileStream(BaseFile, FileMode.Create, FileAccess.Write, FileShare.Read) 
       Using writer = PdfWriter.GetInstance(Doc, FS) 
        Doc.Open() 

        ''//Turn our hyperlinks blue 
        Dim BlueFont As Font = FontFactory.GetFont("Arial", 12, iTextSharp.text.Font.NORMAL, iTextSharp.text.BaseColor.BLUE) 

        ''//Create 10 pages with simple labels on them 
        For I = 1 To 10 
         Doc.NewPage() 
         Doc.Add(New Paragraph(String.Format("Page {0}", I))) 
         ''//On the first page add some links 
         If I = 1 Then 

          ''//Go to pages relative to this page 
          Doc.Add(New Paragraph(New Chunk("First Page", BlueFont).SetAction(New PdfAction(PdfAction.FIRSTPAGE)))) 

          Doc.Add(New Paragraph(New Chunk("Next Page", BlueFont).SetAction(New PdfAction(PdfAction.NEXTPAGE)))) 

          Doc.Add(New Paragraph(New Chunk("Prev Page", BlueFont).SetAction(New PdfAction(PdfAction.PREVPAGE)))) ''//This one does not make sense but is here for completeness 

          Doc.Add(New Paragraph(New Chunk("Last Page", BlueFont).SetAction(New PdfAction(PdfAction.LASTPAGE)))) 

          ''//Go to a specific hard-coded page number 
          Doc.Add(New Paragraph(New Chunk("Go to page 5", BlueFont).SetAction(PdfAction.GotoLocalPage(5, New PdfDestination(0), writer)))) 
         End If 
        Next 
        Doc.Close() 
       End Using 
      End Using 
     End Using 
    End Sub 
    Private Shared Sub ListPdfLinks() 

     ''//Setup some variables to be used later 
     Dim R As PdfReader 
     Dim PageCount As Integer 
     Dim PageDictionary As PdfDictionary 
     Dim Annots As PdfArray 

     ''//Open our reader 
     R = New PdfReader(BaseFile) 
     ''//Get the page cont 
     PageCount = R.NumberOfPages 

     ''//Loop through each page 
     For I = 1 To PageCount 
      ''//Get the current page 
      PageDictionary = R.GetPageN(I) 

      ''//Get all of the annotations for the current page 
      Annots = PageDictionary.GetAsArray(PdfName.ANNOTS) 

      ''//Make sure we have something 
      If (Annots Is Nothing) OrElse (Annots.Length = 0) Then Continue For 

      ''//Loop through each annotation 
      For Each A In Annots.ArrayList 

       ''//I do not completely understand this but I think this turns an Indirect Reference into an actual object, but I could be wrong 
       ''//Anyway, convert the itext-specific object as a generic PDF object 
       Dim AnnotationDictionary = DirectCast(PdfReader.GetPdfObject(A), PdfDictionary) 

       ''//Make sure this annotation has a link 
       If Not AnnotationDictionary.Get(PdfName.SUBTYPE).Equals(PdfName.LINK) Then Continue For 

       ''//Make sure this annotation has an ACTION 
       If AnnotationDictionary.Get(PdfName.A) Is Nothing Then Continue For 

       ''//Get the ACTION for the current annotation 
       Dim AnnotationAction = DirectCast(AnnotationDictionary.Get(PdfName.A), PdfDictionary) 

       ''//Test if it is a named actions such as /FIRST, /LAST, etc 
       If AnnotationAction.Get(PdfName.S).Equals(PdfName.NAMED) Then 
        Trace.Write("GOTO:") 
        If AnnotationAction.Get(PdfName.N).Equals(PdfName.FIRSTPAGE) Then 
         Trace.WriteLine(1) 
        ElseIf AnnotationAction.Get(PdfName.N).Equals(PdfName.NEXTPAGE) Then 
         Trace.WriteLine(Math.Min(I + 1, PageCount)) ''//Any links that go past the end of the document should just go to the last page 
        ElseIf AnnotationAction.Get(PdfName.N).Equals(PdfName.LASTPAGE) Then 
         Trace.WriteLine(PageCount) 
        ElseIf AnnotationAction.Get(PdfName.N).Equals(PdfName.PREVPAGE) Then 
         Trace.WriteLine(Math.Max(I - 1, 1)) ''//Any links the go before the first page should just go to the first page 
        End If 


        ''//Otherwise see if its a GOTO page action 
       ElseIf AnnotationAction.Get(PdfName.S).Equals(PdfName.GOTO) Then 

        ''//Make sure that it has a destination 
        If AnnotationAction.GetAsArray(PdfName.D) Is Nothing Then Continue For 

        ''//Once again, not completely sure if this is the best route but the ACTION has a sub DESTINATION object that is an Indirect Reference. 
        ''//The code below gets that IR, asks the PdfReader to convert it to an actual page and then loop through all of the pages 
        ''//to see which page the IR points to. Very inneficient but I could not find a way to get the page number based on the IR. 

        ''//AnnotationAction.GetAsArray(PdfName.D) gets the destination 
        ''//AnnotationAction.GetAsArray(PdfName.D).ArrayList(0) get the indirect reference part of the destination (.ArrayList(1) has fitting options) 
        ''//DirectCast(AnnotationAction.GetAsArray(PdfName.D).ArrayList(0), PRIndirectReference) turns it into a PRIndirectReference 
        ''//The full line gets us an actual page object (actually I think it could be any type of pdf object but I have not tested that). 
        ''//BIG NOTE: This line really should have a bunch more sanity checks in place 
        Dim AnnotationReferencedPage = PdfReader.GetPdfObject(DirectCast(AnnotationAction.GetAsArray(PdfName.D).ArrayList(0), PRIndirectReference)) 
        Trace.Write("GOTO:") 
        ''//Re-loop through all of the pages in the main document comparing them to this page 
        For J = 1 To PageCount 
         If AnnotationReferencedPage.Equals(R.GetPageN(J)) Then 
          Trace.WriteLine(J) 
          Exit For 
         End If 
        Next 
       End If 
      Next 
     Next 
    End Sub 

    Private Sub Form1_Load(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles MyBase.Load 
     CreateSamplePdf() 
     ListPdfLinks() 
     Me.Close() 
    End Sub 
End Class 
+0

嗨克里斯,感謝 –

0

低於此功能使用iTextSharp的到:

  1. 打開PDF
  2. 頁throught他PDF
  3. 檢查註釋每一頁上的那些壁虎

步驟#4是在這裏插入你想要的任何邏輯...更新鏈接,記錄它們等。

/// <summary>Inspects PDF files for internal links. 
    /// </summary> 
    public static void FindPdfDocsWithInternalLinks() 
    { 
     foreach (var fi in PdfFiles) { 
      try { 
       var reader = new PdfReader(fi.FullName); 
       // Pagination 
       for(var i = 1; i <= reader.NumberOfPages; i++) { 
        var pageDict = reader.GetPageN(i); 
        var annotArray = (PdfArray)PdfReader.GetPdfObject(pageDict.Get(PdfName.ANNOTS)); 
        if (annotArray == null) continue; 
        if (annotArray.Length <= 0) continue; 
        // check every annotation on the page 
        foreach (var annot in annotArray.ArrayList) { 
         var annotDict = (PdfDictionary)PdfReader.GetPdfObject(annot); 
         if (annotDict == null) continue; 
         var subtype = annotDict.Get(PdfName.SUBTYPE).ToString(); 
         if (subtype != "/Link") continue; 
         var linkDict = (PdfDictionary)annotDict.GetDirectObject(PdfName.A); 
         if (linkDict == null) continue; 
         // if it makes it this far, its an Anchor annotation 
         // so we can grab it's URI 
         var sUri = linkDict.Get(PdfName.URI).ToString(); 
         if (String.IsNullOrEmpty(sUri)) continue; 
        } 
       } 
       reader.Close(); 
      } 
      catch (InvalidPdfException e) 
      { 
       if (!fi.FullName.Contains("_vti_cnf")) 
        Console.WriteLine("\r\nInvalid PDF Exception\r\nFilename: " + fi.FullName + "\r\nException:\r\n" + e); 
       continue; 
      } 
      catch (NullReferenceException e) 
      { 
       if (!fi.FullName.Contains("_vti_cnf")) 
        Console.WriteLine("\r\nNull Reference Exception\r\nFilename: " + fi.Name + "\r\nException:\r\n" + e); 
       continue; 
      } 
     } 

     // DO WHATEVER YOU WANT HERE 
    } 

祝你好運。