閱讀利用iText

我一直在使用下面的C＃代碼從PDF文件閱讀的文本從PDF符號和文字：閱讀利用iText

PdfReader reader = new PdfReader(openFileDialog1.FileName); 
      int n = reader.NumberOfPages;    
      // file properties 
      Dictionary<string, string> infodict = reader.Info; 
      string strText = string.Empty; 
      PdfReader reader2 = new PdfReader(openFileDialog1.FileName); 
      for (int page = 1; page <= n; page++) 
      { 
       ITextExtractionStrategy its = new iTextSharp.text.pdf.parser.SimpleTextExtractionStrategy();  
       String s = PdfTextExtractor.GetTextFromPage(reader, page, its);     
       s = Encoding.UTF8.GetString(ASCIIEncoding.Convert(Encoding.Default, Encoding.UTF8, Encoding.Default.GetBytes(s))); 
       strText = strText + s; 
       reader.Close(); 
      } 
      MessageBox.Show(strText);

此代碼無法讀取PDF文件的符號。有什麼辦法可以從PDF文件中讀取符號嗎？

來源

2011-10-20 fawad

這是什麼線？：做 S = Encoding.UTF8.GetString（ASCIIEncoding.Convert（Encoding.Default，Encoding.UTF8，Encoding.Default.GetBytes的LocationTextExtractionStrategy （s）））我會完全刪除此行。 –

s讀取整個頁面的文本。 – fawad

s，正在轉換爲UTF8，然後編碼爲ASCII，然後回到UTF8，它如何「讀取」頁面？ –

試試這個，而不是使用的SimpleTextExtractionStrategy

來源

2014-05-21 05:11:49 Binod

這兩種策略都使用相同的字符信息，「LocationTextExtractionStrategy」僅嘗試排序，而「SimpleTextExtractionStrategy」則假定信息已按照正確的順序排列。因此，當*代碼無法讀取PDF文件*中的符號時，替換無效。 – mkl

回答

相關問題