JAVA Tess4j doOCR()不工作,exception“内存访问无效”

我在eclipse中的动态Web项目中工作,我创建了一个包含以下内容的TesseractOCR类:

public class TesseractOCR { public TesseractOCR() { } public String doOCR(String file) { System.setProperty("jna.library.path", "32".equals(System.getProperty("sun.arch.data.model")) ? "lib/win32-x86" : "lib/win32-x86-64"); File imageFile = new File("C:\\Users\\Sherein Dabbah\\Downloads\\ca096-d7a6d799d7a1d798d799d7a72.jpg"); Tesseract instance = Tesseract.getInstance(); // JNA Interface Mapping Tesseract1 instance1 = new Tesseract1(); instance.setLanguage("heb+eng"); // Tesseract1 instance = new Tesseract1(); // JNA Direct Mapping // File tessDataFolder = LoadLibs.extractTessResources("tessdata"); // Maven build bundles English data // instance.setDatapath(tessDataFolder.getAbsolutePath()); String sub =""; try { String result = instance.doOCR(imageFile); int indx1 = 6+result.indexOf("אבחנות"); int indx2 = result.indexOf("הפניות"); sub = result.substring(indx1,indx2-1); System.out.println(sub); } catch (Exception e) { System.err.println(e.getMessage()); } return sub; } } 

而有一个包含函数doPost()的servlet

 protected void doPost(HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException { System.setProperty("jna.library.path", "32".equals(System.getProperty("sun.arch.data.model")) ? "lib/win32-x86" : "lib/win32-x86-64"); response.setContentType("text/html;charset=UTF-8"); // Create path components to save the file final String path = "C:\\Users\\Sherein Dabbah\\Desktop\\med"; //request.getParameter("destination"); final Part filePart = request.getPart("file"); final String fileName = filePart.getSubmittedFileName(); OutputStream out = null; InputStream filecontent = null; PrintWriter writer = response.getWriter(); if(fileName == ""){ writer.println("You either did not specify a file to upload or are " + "trying to upload a file to a protected or nonexistent " + "location."); return; } String fullName = path + File.separator+ fileName; try { File newFile = new File(fullName); out = new FileOutputStream(newFile); filecontent = filePart.getInputStream(); int read = 0; final byte[] bytes = new byte[1024]; while ((read = filecontent.read(bytes)) != -1) { out.write(bytes, 0, read); } writer.println("New file " + fileName + " created at " + path); LOGGER.log(Level.INFO, "File{0}being uploaded to {1}", new Object[]{fileName, path}); } catch (FileNotFoundException fne) { writer.println("You either did not specify a file to upload or are " + "trying to upload a file to a protected or nonexistent " + "location."); writer.println("
ERROR: " + fne.getMessage()); LOGGER.log(Level.SEVERE, "Problems during file upload. Error: {0}", new Object[]{fne.getMessage()}); } finally { if (out != null) { out.close(); } if (filecontent != null) { filecontent.close(); } if (writer != null) { writer.close(); } } String s = new TesseractOCR().doOCR(fullName); System.out.println(s); }

我有一个例外:

  Sep 06, 2015 10:36:46 AM org.apache.catalina.core.StandardWrapperValve invoke SEVERE: Servlet.service() for servlet [servlets.UploadServlet] in context with path [/up] threw exception [Servlet execution threw an exception] with root cause java.lang.Error: Invalid memory access at com.sun.jna.Native.invokePointer(Native Method) at com.sun.jna.Function.invokePointer(Function.java:470) at com.sun.jna.Function.invoke(Function.java:404) at com.sun.jna.Function.invoke(Function.java:315) at com.sun.jna.Library$Handler.invoke(Library.java:212) at com.sun.proxy.$Proxy4.TessBaseAPIGetUTF8Text(Unknown Source) at net.sourceforge.tess4j.Tesseract.getOCRText(Unknown Source) at net.sourceforge.tess4j.Tesseract.doOCR(Unknown Source) at net.sourceforge.tess4j.Tesseract.doOCR(Unknown Source) at net.sourceforge.tess4j.Tesseract.doOCR(Unknown Source) at classes.TesseractOCR.doOCR(TesseractOCR.java:28) at servlets.UploadServlet.doPost(UploadServlet.java:111) at... 

它在行失败:String result = instance.doOCR(imageFile); 在TesseractOCR课程中

您可能需要调用setDatapath来告诉它在哪里找到tessdata文件的tessdata文件夹。

此外,您可能不再需要设置jna.library.path变量,因为tess4j现在可以自动提取并加载本机库。

在这种情况下,选择语言也很重要 –我用lang = hin + eng处理图像,但它给出了同样的错误(在这篇文章中提到)

由于英文文字在图像中较少,所以我改变了lang = hin,我得到了预期的结果。

 public static void main(String[] args) { Tesseract in = new ReadImageText().getTesseractInstance("C:/Program Files (x86)/Tesseract-OCR/tessdata/", "hin"); try { String resultText = in.doOCR(new File("C:/EA/app-result/im/01-001/34/0.png")); log.info("resultText {}", resultText); } catch (TesseractException e) { // TODO Auto-generated catch block e.printStackTrace(); } }