Identify and remove text from graph

Question

0 votos

I'm in the process of designing a semi-automatic graph digitizer that can convert screenshots of a graph into a CSV file with the data.

I'm running into an issue when certain graphs contain characters and text boxes in the graph space, as in the example images below:

After removing the gridlines and producing a binary image "curve" (that's supposed to contain just the curve), my current code uses

[y, x] = find(curve)

to determine the x and y position of points on the curve. However, if there is any text on the graph, the code thinks it's part of the data curve and finds points in its location. As in the above examples, the code "finds" datapoints where the 'Conditions' and '3C96' text is located.

My question is: Is there any way to automatically detect and mask over this text, perhaps using OCR? Or is automating it a lost cause, and I should instead manually crop out any text initially?

0 comentarios
Mostrar -2 comentarios más antiguos Ocultar -2 comentarios más antiguos

Iniciar sesión para comentar.

Iniciar sesión para responder a esta pregunta.

Follow Question

Answer 1

Image Analyst el 23 de Jul. de 2021

Editada: Image Analyst el 23 de Jul. de 2021

0 votos

If the graphs are digital images, you can simply use bwareafilt() in the Image Processing Toolbox to take either the largest blob, or blobs within a certain size range. You might need to add some code options to handle cases where grid lines can be solid or dashed, whether text is in boxes or not, or whether you want the gridlines to remain in the output image or not.

By the way, there are File Exchange entries already:

https://www.mathworks.com/matlabcentral/fileexchange/?term=tag%3A%22digitize%22

2 comentarios
Mostrar Ninguno Ocultar Ninguno

Carl Youel el 28 de Jul. de 2021

I'll give that a shot. Seems like there won't be an easy solution given the variability of graph styles.

I've seen the File Exchange scripts, and they are indeed helpful to a certain degree. However, it seems all of them require you to place clicks point by point. My script seeks to do this process semi-automatically, where the user need only upload their graph and select a few options. I hope to post it on the File Exchange when I end up finishing it.

Thanks again

Image Analyst el 28 de Jul. de 2021

If you have a consistent type of graph you're dealing with then it's possible to do it automatically. If however the graphs are wildly different, then developing an automatic solution may be more trouble than it's worth and it's best just to have user-assisted processing.

Iniciar sesión para comentar.

Identify and remove text from graph

0 comentarios
Mostrar -2 comentarios más antiguos Ocultar -2 comentarios más antiguos

Respuesta aceptada

2 comentarios
Mostrar Ninguno Ocultar Ninguno

Más respuestas (0)

Categorías

Productos

Versión

Etiquetas

Community Treasure Hunt

Identify and remove text from graph

0 comentarios Mostrar -2 comentarios más antiguos Ocultar -2 comentarios más antiguos

Respuesta aceptada

2 comentarios Mostrar Ninguno Ocultar Ninguno

Más respuestas (0)

Categorías

Productos

Versión

Etiquetas

Ver también

Community Treasure Hunt

0 comentarios
Mostrar -2 comentarios más antiguos Ocultar -2 comentarios más antiguos

2 comentarios
Mostrar Ninguno Ocultar Ninguno