Conversion of PDF data to Spreadsheet

8 visualizaciones (últimos 30 días)
Shaili Bulusu
Shaili Bulusu el 24 de Mayo de 2017
Comentada: Guillaume el 25 de Mayo de 2017
HI I have several reports in PDF format. I would like to write an m-script to capture the data into spreadsheet. I thought the best method would be to add all the headers to an array, capturing each page's data in the PDF to different sheets in Excel and then populate the fields with the values corresponding to the headers. Is there a better way to achieve this?

Respuestas (1)

Guillaume
Guillaume el 24 de Mayo de 2017
Well, your first hurdle will be to capture the data from the pdf. There is no built-in tool for this in matlab and depending on the structure of the pdf this will be either a fair amount of work (data is actually stored as continuous text in the file) or extremely hard (data is stored as text but scattered through the file, or data is just an image of the text which will require ocr).
pdf is not really designed to transfer structured data to a computer. It's mostly meant to be read by a human.
  2 comentarios
Guillaume
Guillaume el 25 de Mayo de 2017
Shaili Bulusu's comment posted as an answer moved here:
I understand the difficulties. But I have a script that will read the data for me from the pdf. My query is on the approach of sorting the headers as an array or if there is a better way to capture the data into a spreadsheet.
Guillaume
Guillaume el 25 de Mayo de 2017
More details on what the approach of sorting the headers as an array means would be required to answer your question. What form does the inputs come in, and what form of output do you want?

Iniciar sesión para comentar.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by