使用itextsharp根據大小將pdf拆分爲更小的pdf

因此，我們有一些非常低效的代碼，它會根據允許的最大大小將pdf拆分爲更小的塊。阿卡。如果最大大小爲10megs，則會跳過8 meg文件，而基於頁數將分割16 meg文件。使用itextsharp根據大小將pdf拆分爲更小的pdf

這是我繼承的代碼，覺得必須有更高效的方法才能做到這一點，只需要一個方法和更少的實例化對象。

我們用下面的代碼來調用的方法：

 List<int> splitPoints = null; 
     List<byte[]> documents = null; 

     splitPoints = this.GetPDFSplitPoints(currentDocument, maxSize); 
     documents = this.SplitPDF(currentDocument, maxSize, splitPoints);

方法：

private List<int> GetPDFSplitPoints(IClaimDocument currentDocument, int maxSize) 
    { 
     List<int> splitPoints = new List<int>(); 
     PdfReader reader = null; 
     Document document = null; 
     int pagesRemaining = currentDocument.Pages; 

     while (pagesRemaining > 0) 
     { 
      reader = new PdfReader(currentDocument.Data); 
      document = new Document(reader.GetPageSizeWithRotation(1)); 

      using (MemoryStream ms = new MemoryStream()) 
      { 
       PdfCopy copy = new PdfCopy(document, ms); 
       PdfImportedPage page = null; 

       document.Open(); 

       //Add pages until we run out from the original 
       for (int i = 0; i < currentDocument.Pages; i++) 
       { 
        int currentPage = currentDocument.Pages - (pagesRemaining - 1); 

        if (pagesRemaining == 0) 
        { 
         //The whole document has bee traversed 
         break; 
        } 

        page = copy.GetImportedPage(reader, currentPage); 
        copy.AddPage(page); 

        //If the current collection of pages exceeds the maximum size, we save off the index and start again 
        if (copy.CurrentDocumentSize > maxSize) 
        { 
         if (i == 0) 
         { 
          //One page is greater than the maximum size 
          throw new Exception("one page is greater than the maximum size and cannot be processed"); 
         } 

         //We have gone one page too far, save this split index 
         splitPoints.Add(currentDocument.Pages - (pagesRemaining - 1)); 
         break; 
        } 
        else 
        { 
         pagesRemaining--; 
        } 
       } 

       page = null; 

       document.Close(); 
       document.Dispose(); 
       copy.Close(); 
       copy.Dispose(); 
       copy = null; 
      } 
     } 

     if (reader != null) 
     { 
      reader.Close(); 
      reader = null; 
     } 

     document = null; 

     return splitPoints; 
    } 

    private List<byte[]> SplitPDF(IClaimDocument currentDocument, int maxSize, List<int> splitPoints) 
    { 
     var documents = new List<byte[]>(); 
     PdfReader reader = null; 
     Document document = null; 
     MemoryStream fs = null; 
     int pagesRemaining = currentDocument.Pages; 

     while (pagesRemaining > 0) 
     { 
      reader = new PdfReader(currentDocument.Data); 
      document = new Document(reader.GetPageSizeWithRotation(1)); 

      fs = new MemoryStream(); 
      PdfCopy copy = new PdfCopy(document, fs); 
      PdfImportedPage page = null; 

      document.Open(); 

      //Add pages until we run out from the original 
      for (int i = 0; i <= currentDocument.Pages; i++) 
      { 
       int currentPage = currentDocument.Pages - (pagesRemaining - 1); 
       if (pagesRemaining == 0) 
       { 
        //We have traversed all pages 
        //The call to copy.Close() MUST come before using fs.ToArray() because copy.Close() finalizes the document 
        fs.Flush(); 
        copy.Close(); 
        documents.Add(fs.ToArray()); 
        document.Close(); 
        fs.Dispose(); 
        break; 
       } 

       page = copy.GetImportedPage(reader, currentPage); 
       copy.AddPage(page); 
       pagesRemaining--; 

       if (splitPoints.Contains(currentPage + 1) == true) 
       { 
        //Need to start a new document 
        //The call to copy.Close() MUST come before using fs.ToArray() because copy.Close() finalizes the document 
        fs.Flush(); 
        copy.Close(); 
        documents.Add(fs.ToArray()); 
        document.Close(); 
        fs.Dispose(); 
        break; 
       } 
      } 

      copy = null; 
      page = null; 

      fs.Dispose(); 
     } 

     if (reader != null) 
     { 
      reader.Close(); 
      reader = null; 
     } 

     if (document != null) 
     { 
      document.Close(); 
      document.Dispose(); 
      document = null; 
     } 

     if (fs != null) 
     { 
      fs.Close(); 
      fs.Dispose(); 
      fs = null; 
     } 

     return documents; 
    }

據我所知道的，唯一的代碼，在網上，我可以看到的是VB和沒有按」不一定解決尺寸問題。

UPDATE：

我們遇到OutOfMemory異常，我相信這是一個與大對象堆的問題。所以有一個想法是減少代碼佔用量，這可能會減少堆上的大型對象的數量。

基本上，這是循環的一部分，它會經歷任意數量的PDF，然後將它們拆分並將它們存儲在數據庫中。現在，我們不得不一次性改變方法（最後一次運行97個不同大小的pdf），每5分鐘運行5個pdf文件。這並不理想，當我們向更多的客戶提供這種工具時，這種情況不會很好地擴展。

（我們正在處理50 - 100兆pdf的，但他們可能會更大）。

來源

2012-01-26 Cyfer13

恕我直言，如果這項工作，讓它一個人。我不認爲*是一種很好的分割PDF的方法，因爲預測頁面大小非常困難。頁面可能很小，因爲它有1000個字（相對較小），或者一個頁面可能非常大，因爲它嵌入了高分辨率圖像。 – CodingGorilla

我們遇到OutofMemory異常，我認爲這是大對象堆的問題。所以有一個想法是減少代碼佔用量，這可能會減少堆上的大型對象的數量。（我們正在處理50 - 100兆pdf的，但他們可能會更大）。 – Cyfer13

如果不是因爲錯誤，我不會觸及可用的代碼。 – Cyfer13

我也繼承了這個確切的代碼，它似乎存在一個主要缺陷。在GetPDFSplitPoints方法中，它將檢查複製頁面的總大小與最大大小，以確定在哪個頁面上分割文件。
在SplitPDF方法中，當它到達發生分割的頁面時，確保該點處的MemoryStream低於允許的最大大小，並且再多一頁將超出該限制。但是在執行document.Close();之後，MemoryStream中增加了更多內容（在我使用的一個PDF示例中，MemoryStream的Length從012 MB之前和之後的9 MB變爲19 MB）。我的理解是，複製頁面的所有必要資源都將添加到Close上。
我猜我必須完全重寫這段代碼，以確保在保持原始頁面完整性的同時，不會超過最大尺寸。

來源

2012-10-13 20:15:07

使用itextsharp根據大小將pdf拆分爲更小的pdf

回答

相關問題