High speed OCR and parallel processing

Question

0 votos

Hello,

I've written a number of programs to read tabular data in images using Matlab's ocr function. I have cleaned up the image files before using OCR (binarize, etc.). However it is taking about ~4 secs per 100 rows of single column data. Unfortunately I have hundreds of thousands of rows to work with so I need a way to speed this up. Using ROI or cropping the image into individual table cells didn't make much difference. Can someone help by pointing out some options?

Is there a way to make OCR run faster?
I have seen some documentation on parallel processing and was wondering if that could help. My computer has 4 cores. Should I explore the following?

hyperthreading
increase number of workers more than the number of cores
increase number of threads per worker.

In essence I'm looking to split the hundreds of image files to be processed separately and want to maximise the speed.

Thank you.

0 comentarios
Mostrar -2 comentarios más antiguos Ocultar -2 comentarios más antiguos

Iniciar sesión para comentar.

Iniciar sesión para responder a esta pregunta.

Follow Question

Answer 1

Walter Roberson el 29 de En. de 2017

0 votos

To get ocr() to maybe run faster you would need to train a custom network. This assumes that fonts and handwriting sloppiness are more restricted for your situation (eg. one font of one size) ; if you have a general written OCR problem then training your own network is not likely to speed anything up.

The general task of OCR could, I suspect, be done faster using different algorithms. I say that thinking about the speed of the automatic mail sorters. On the other hand those do not have to deal with hundreds of rows.

You need to profile your code. Hyperthreading is an advantage if you are waiting on I/O. If you are busy with computations then Hyperthreading can slow things down. Assigning more threads than cores or more workers than cores leads to contention for resources unless they typically spend a lot of time waiting for interrupts.

parfor and SPMD are not always more productive. They are most effective for low IO high computation where the matrices involved are small or moderate and you do not do extensive tasks such as eigenvalues or \ operation. With larger matrices and vectorized code especially code that does linear algebra then you would typically get better performance leaving it not explicitly parallel so that it can use the multithreaded high performance libraries (those have much lower overhead than creating workers)

1 comentario
Mostrar -1 comentarios más antiguos Ocultar -1 comentarios más antiguos

Azura Hashim el 2 de Feb. de 2017

Thanks for your help!

Iniciar sesión para comentar.

High speed OCR and parallel processing

0 comentarios
Mostrar -2 comentarios más antiguos Ocultar -2 comentarios más antiguos

Respuesta aceptada

1 comentario
Mostrar -1 comentarios más antiguos Ocultar -1 comentarios más antiguos

Más respuestas (0)

Categorías

Productos

Etiquetas

Community Treasure Hunt

High speed OCR and parallel processing

0 comentarios Mostrar -2 comentarios más antiguos Ocultar -2 comentarios más antiguos

Respuesta aceptada

1 comentario Mostrar -1 comentarios más antiguos Ocultar -1 comentarios más antiguos

Más respuestas (0)

Categorías

Productos

Etiquetas

Ver también

Community Treasure Hunt

0 comentarios
Mostrar -2 comentarios más antiguos Ocultar -2 comentarios más antiguos

1 comentario
Mostrar -1 comentarios más antiguos Ocultar -1 comentarios más antiguos