Main Content

readPDFFormData

Read data from PDF forms

Description

example

data = readPDFFormData(filename) reads the data from a PDF form into a struct.

data = readPDFFormData(filename,'Password',password) specifies the password for opening the PDF form.

Examples

collapse all

Read the data from the form fields in weatherReportForm1.pdf using readPDFFormData. The function returns a struct containing the data from the PDF form fields.

filename = "weatherReportForm1.pdf";
data = readPDFFormData(filename)
data = struct with fields:
         event_type: "Thunderstorm Wind"
    event_narrative: "Large tree down between Plantersville and Nettleton."

Read the data from the form fields in multiple files using a file datastore.

Create a file datastore for the weather reports forms. The forms are named "weatherReportFormN.pdf", where N is the number of the form.. Specify the file name using the wildcard "*" to find all file names of this structure. To specify the read function to be readPDFFormData, input this function to fileDatastore using a function handle.

fds = fileDatastore("weatherReportForm*.pdf",'ReadFcn',@readPDFFormData)
fds = 
  FileDatastore with properties:

                       Files: {
                              ' .../tp50688320/textanalytics-ex39762425/weatherReportForm1.pdf';
                              ' .../tp50688320/textanalytics-ex39762425/weatherReportForm2.pdf';
                              ' .../tp50688320/textanalytics-ex39762425/weatherReportForm3.pdf'
                               ... and 1 more
                              }
                     Folders: {
                              '/tmp/Bdoc23b_2361005_1099857/tp50688320/textanalytics-ex39762425'
                              }
                 UniformRead: 0
                    ReadMode: 'file'
                   BlockSize: Inf
                  PreviewFcn: @readPDFFormData
      SupportedOutputFormats: ["txt"    "csv"    "xlsx"    "xls"    "parquet"    "parq"    "png"    "jpg"    "jpeg"    "tif"    "tiff"    "wav"    "flac"    "ogg"    "opus"    "mp4"    "m4a"]
                     ReadFcn: @readPDFFormData
    AlternateFileSystemRoots: {}

Loop over the files in the datastore and read each PDF form.

data = [];
while hasdata(fds)
    textData = read(fds);
    data = [data; textData];
end
data
data=4×1 struct array with fields:
    event_type
    event_narrative

Input Arguments

collapse all

Name of the file, specified as a string scalar or character vector.

readPDFFormData supports AcroForm PDF files (interactive forms) only.

Data Types: string | char

Password to open the PDF file, specified as a character vector or a string scalar.

Example: "skroWhtaM"

Data Types: string | char

Output Arguments

collapse all

Output struct. The fields of data correspond to the names of the form fields in the PDF. If the form field names are not valid struct field names, then the function automatically edits them to construct valid names.

Version History

Introduced in R2018a