2016-11-30 218 views
1

我可以使用Tesseract掃描JPG圖像,我可以使用ITextSharp掃描常規PDF並從中獲取文本。但是我無法找到一種方法,可以從PDF格式的PDF擴展名中獲取文本,也可以將PDF轉換爲圖像,然後使用Tesseract對其進行掃描。有沒有我錯過的選項?謝謝!將掃描PDF轉換爲圖像

回答

0

假設您已掃描PDF文檔。其次,假設您只有PDF文檔中的文本。您可以從下面的方法生成的文本圖像

private Image DrawText(String text, Font font, Color textColor, Color backColor) 
{ 
    //first, create a dummy bitmap just to get a graphics object 
    Image img = new Bitmap(1, 1); 
    Graphics drawing = Graphics.FromImage(img); 

    //measure the string to see how big the image needs to be 
    SizeF textSize = drawing.MeasureString(text, font); 

    //free up the dummy image and old graphics object 
    img.Dispose(); 
    drawing.Dispose(); 

    //create a new image of the right size 
    img = new Bitmap((int) textSize.Width, (int)textSize.Height); 

    drawing = Graphics.FromImage(img); 

    //paint the background 
    drawing.Clear(backColor); 

    //create a brush for the text 
    Brush textBrush = new SolidBrush(textColor); 

    drawing.DrawString(text, font, textBrush, 0, 0); 

    drawing.Save(); 

    textBrush.Dispose(); 
    drawing.Dispose(); 

    return img; 

} 

參考:How to generate an image from text on fly at runtime