從圖像中提取文本

我正在從圖像中提取文本。從圖像中提取文本

最初圖像被着色成放置在白色的文字，在進一步處理所述圖像，文本顯示在黑色和其它像素是白色的（有一些噪聲），在這裏是一個示例：

現在，當我嘗試使用pytesseract（tesseract）對其進行OCR時，我仍然沒有收到任何文本。

是否有解決方案可以從彩色圖像中提取文本？

來源

2017-09-17 Yash Arora

將顏色轉換爲灰度並設置二進制閾值，以使所有內容都爲黑色或白色。你可以嘗試使用去斑或刪除噪聲，但如果命令行中的'tesseract'不能提取它，那麼我會推薦來自Google的'ocropy'。 –

您是否嘗試從[Adrian Rosebrock的博客]（http://www.pyimagesearch.com/2017/07/10/using-tesseract-ocr-python/）獲取幫助？ –

原則上它應該是可能的：您的圖片在Google OCR中運行得很好，而在ocr.space中則爲一半。我測試了https://ocr.space/compare-ocr-software –

from PIL import Image 
import pytesseract 
import argparse 
import cv2 

# construct the argument parser and parse the arguments 
ap = argparse.ArgumentParser() 
ap.add_argument("-i", "--image", required=True, help="Path to the image") 
args = vars(ap.parse_args()) 

# load the image and convert it to grayscale 
image = cv2.imread(args["image"]) 
cv2.imshow("Original", image) 

# Apply an "average" blur to the image 

blurred = cv2.blur(image, (3,3)) 
cv2.imshow("Blurred_image", blurred) 
img = Image.fromarray(blurred) 
text = pytesseract.image_to_string(img, lang='eng') 
print (text) 
cv2.waitKey(0)

由於作爲結果我得到=「住宿：在Overwoter平房$ 3。»」

怎麼樣使用輪廓，並從它承擔不必要的斑點？可能會工作

來源

2017-09-20 07:03:18

謝謝，我會試一試，會發布結果。 –

從圖像中提取文本

回答

相關問題