Tesseract在android中識別阿拉伯語文本

我正在開發一個應用程序，我使用Tesseract OCR識別圖像中的文本。我測試了英文和日文，它工作正常，但是當我嘗試使用阿拉伯語時，應用程序甚至在啓動之前崩潰！爲什麼？Tesseract在android中識別阿拉伯語文本

阿拉伯語和Tesseract OCR有什麼問題？有人可以告訴我嗎？

代碼：

public class MainActivity extends AppCompatActivity { 

Bitmap image; 
private TessBaseAPI mTess; 
String datapath = ""; 

@Override 
protected void onCreate(Bundle savedInstanceState) { 
    super.onCreate(savedInstanceState); 
    setContentView(R.layout.activity_main); 

    //init image 
    image = BitmapFactory.decodeResource(getResources(), R.drawable.test_ara); 

    //initialize Tesseract API 
    String language = "ra"; 
    datapath = getFilesDir()+ "/tesseract/"; 
    mTess = new TessBaseAPI(); 

    checkFile(new File(datapath + "tessdata/")); 

    mTess.init(datapath, language); 
} 

public void processImage(View view){ 
    String OCRresult = null; 
    mTess.setImage(image); 
    OCRresult = mTess.getUTF8Text(); 
    TextView OCRTextView = (TextView) findViewById(R.id.OCRTextView); 
    OCRTextView.setText(OCRresult); 
} 

private void checkFile(File dir) { 
    if (!dir.exists()&& dir.mkdirs()){ 
      copyFiles(); 
    } 
    if(dir.exists()) { 
     String datafilepath = datapath+ "/tessdata/ara.traineddata"; 
     File datafile = new File(datafilepath); 

     if (!datafile.exists()) { 
      copyFiles(); 
     } 
    } 
} 

private void copyFiles() { 
    try { 
     String filepath = datapath + "/tessdata/ara.traineddata"; 
     AssetManager assetManager = getAssets(); 

     InputStream instream = assetManager.open("tessdata/ara.traineddata"); 
     OutputStream outstream = new FileOutputStream(filepath); 

     byte[] buffer = new byte[1024]; 
     int read; 
     while ((read = instream.read(buffer)) != -1) { 
      outstream.write(buffer, 0, read); 
     } 


     outstream.flush(); 
     outstream.close(); 
     instream.close(); 

     File file = new File(filepath); 
     if (!file.exists()) { 
      throw new FileNotFoundException(); 
     } 
    } catch (FileNotFoundException e) { 
     e.printStackTrace(); 
    } catch (IOException e) { 
     e.printStackTrace(); 
    } 
} 
}

我得到的錯誤：

04-16 18:37:08.451 7405-7405/com.imperialsoupgmail.tesseractexample A/libc: Fatal signal 11 (SIGSEGV), code 1, fault addr 0x0 in tid 7405 (esseractexample)

來源

2017-04-16 Lama Tatwany

當崩潰時發佈確切的錯誤文本。 – sashoalm

@sashoalm剛剛發佈了它。 –

對於阿拉伯語，你需要使用魔方：使用OEM_CUBE_ONLY引擎模式調用init()和使用多維數據集data files 。

來源

2017-04-16 13:44:59 rmtheis

我剛剛發佈了代碼，您能否告訴我應該在哪裏更改以及如何修改？或者是否有任何有關如何使用Cube的好例子？ –

謝謝！現在它工作:)但承認是錯誤的！你知道爲什麼嗎？ –

我不能說沒有更多的信息。問一個新的問題，幷包括所有的細節。 – rmtheis

Tesseract在android中識別阿拉伯語文本

回答

相關問題