iText不返回PDF的文本內容第一頁後

我想用c＃的iText庫來捕獲PDF文件的文本部分。iText不返回PDF的文本內容第一頁後

我從excel 2013（導出）創建了一個pdf，然後從web如何使用itext（添加了lib ref到項目）複製樣本。

它完美地讀取了第一頁，但是之後它獲得了亂碼信息。它保留了第一頁的一部分，並將信息與下一頁合併。註釋行是當我試圖解決問題時，字符串「thePage」在for循環內重新創建。

這是代碼。我可以通過電子郵件發送給任何可以幫助解決這個問題的人。

在此先感謝

public static string ExtractTextFromPdf(string path) 
    { 

     ITextExtractionStrategy its = new iTextSharp.text.pdf.parser.LocationTextExtractionStrategy(); 

     using (PdfReader reader = new PdfReader(path)) 
     { 
      StringBuilder text = new StringBuilder(); 

      //string[] theLines; 
      //theLines = new string[COLUMNS]; 
      //string thePage; 

      for (int i = 1; i <= reader.NumberOfPages; i++) 
      { 
       string thePage = ""; 
       thePage = PdfTextExtractor.GetTextFromPage(reader, i, its); 

       string [] theLines = thePage.Split('\n'); 
       foreach (var theLine in theLines) 
       { 
        text.AppendLine(theLine); 
       } 
      // text.AppendLine(" "); 
      // Array.Clear(theLines, 0, theLines.Length); 
      // thePage = ""; 
      } 
      return text.ToString(); 
     } 
    }

來源

2014-07-24 user1555945

爲每個頁面使用新的策略對象。策略對象收集文本數據，並不知道新頁面是否已啓動。 – mkl

謝謝，我在循環內部實例化了它的對象並解決了這個問題。我知道我應該「解決問題的答案」，但我不知道如何。不管怎樣，謝謝你的迴應。 – user1555945

我只創建了一條評論，而不是一個答案。我會做出明確的答案，您可以通過點擊左側的標記來接受答案。 – mkl