2015-05-14 22 views
0

我已經按照教程來獲取Tesseract,特別是苔絲二和眼睛二安裝和我的Android應用程序的一部分。tess-two OCR沒有正確解碼

它運行,但從 baseApi.getUTF8Text();返回的OCR文本是完整的亂碼。

BitmapFactory.Options options = new BitmapFactory.Options(); 
     options.inSampleSize = 4; 
     Bitmap bmp = BitmapFactory.decodeFile(path , options); 
     receipt.setImageBitmap(bmp); 

     try { 
      ExifInterface exif = new ExifInterface(path); 
      int exifOrientation = exif.getAttributeInt(ExifInterface.TAG_ORIENTATION , ExifInterface.ORIENTATION_NORMAL); 
      int rotate = 0; 
      switch (exifOrientation) { 
       case ExifInterface.ORIENTATION_ROTATE_90: rotate = 90; break; 
       case ExifInterface.ORIENTATION_ROTATE_180: rotate = 180; break; 
       case ExifInterface.ORIENTATION_ROTATE_270: rotate = 270; break; 
      } 
      if (rotate != 0) { 
       int w = bmp.getWidth(); 
       int h = bmp.getHeight(); 
       Matrix matrix = new Matrix(); 
       matrix.preRotate(rotate); 
       bmp = Bitmap.createBitmap(bmp, 0, 0, w, h, matrix, false); 
      } 

      bmp = bmp.copy(Bitmap.Config.ARGB_8888, true); 


      TessBaseAPI baseApi = new TessBaseAPI(); 
      baseApi.init(DATA_PATH , "eng"); 
      baseApi.setImage(bmp); 
      String OCRText = baseApi.getUTF8Text(); 
      baseApi.end(); 

      Log.i("OCR Text", "rotate " + rotate); 
      Log.i("OCR Text", "OCR "); 
      Log.i("OCR Text", OCRText); 
      Log.i("OCR Text", "======================================================================================="); 

拍攝具有OCR字符 返回

05-14 11:01:59.131: I/OCR Text(18199): rotate 90 
05-14 11:01:59.131: I/OCR Text(18199): OCR 
05-14 11:01:59.131: I/OCR Text(18199): 4— ‘ ‘ 
05-14 11:01:59.131: I/OCR Text(18199): \Dxfi ‘ 
05-14 11:01:59.131: I/OCR Text(18199): I W man"! no Accounv 
05-14 11:01:59.131: I/OCR Text(18199): 1’ 
05-14 11:01:59.131: I/OCR Text(18199): my... «unblm m. mm. 
05-14 11:01:59.131: I/OCR Text(18199): :~A 
05-14 11:01:59.131: I/OCR Text(18199): «Ln. 
05-14 11:01:59.131: I/OCR Text(18199): ‘ 「w 「IN. N I 「H‘M‘ 
05-14 11:01:59.131: I/OCR Text(18199): mmnwnmw- .; k. ' 
05-14 11:01:59.131: I/OCR Text(18199): Wilt-run」. uni」 nl 
05-14 11:01:59.131: I/OCR Text(18199): mam. I 
05-14 11:01:59.131: I/OCR Text(18199): ======================================================================================= 

如何清理和糾正OCR識別任何意見支票?使用 設備是三星Galaxy 7"

+0

三星Galaxy Tab 2 7" 沒有按在主攝像頭(後置)上沒有自動對焦功能,所以在使用不同設備之後,您不可能獲得更好的效果。 – rmtheis

回答

0

您可以使用類似

OCRText = OCRText.replaceAll("[^a-zA-Z0-9]+", " "); 
OCRText = OCRText.trim(); 

它是基於一個正方體實現我發現這裏:SimpleAndroidOCRActivity.java

+2

謝謝。但我相信這可能與焦點有關。如果我使用前置攝像頭(具有自動對焦)進行掃描,則準確度達到90%更有意義。當我使用後置攝像頭進行掃描時(它沒有自動對焦),這是上面的亂碼。這應該是一個名字和地址。 – NewDev