0
我目前使用tesseract掃描收據。質量不好,所以我閱讀這篇文章如何改進它:https://github.com/tesseract-ocr/tesseract/wiki/ImproveQuality#noise-removal。我實現了調整大小,歪斜(對齊)和高斯模糊。但是除了去角度之外,它們都沒有對OCR的準確性產生積極影響。這裏是我調整大小和高斯模糊的代碼。我做錯了什麼?如果不是,我還能做些什麼來提供幫助?改善Tesseract OCR質量不合格
代碼:
+(UIImage *) prepareImage: (UIImage *)image{
//converts UIImage to Mat format
Mat im = cvMatWithImage(image);
//grayscale image
Mat gray;
cvtColor(im, gray, CV_BGR2GRAY);
//deskews text
//did not provide code because I know it works
Mat preprocessed = preprocess2(gray);
double skew = hough_transform(preprocessed, im);
Mat rotated = rot(im,skew* CV_PI/180);
//resize image
Mat scaledImage = scaleImage(rotated, 2);
//Guassian Blur
GaussianBlur(scaledImage, scaledImage, cv::Size(1, 1), 0, 0);
return UIImageFromCVMat(scaledImage);
}
// Organization -> Resizing
Mat scaleImage(Mat mat, double factor){
Mat resizedMat;
double width = mat.cols;
double height = mat.rows;
double aspectRatio = width/height;
resize(mat, resizedMat, cv::Size(width*factor*aspectRatio, height*factor*aspectRatio));
return resizedMat;
}
收據:
也許[此鏈接](http://www.danvk.org/2015/01/11/training-an-ocropus-ocr-model.html)將有幫助 – sturkmen