ocr

Recognize text using optical character recognition

Syntax

txt = ocr(I)

txt = ocr(I,roi)

txt = ocr(ds)

txt = ocr(___,Name=Value)

Description

txt = ocr(I) returns an ocrText object that contains optical character recognition (OCR) information from the input image I. The object contains recognized characters, words, text lines, the locations of recognized words, and a metric indicating the confidence of each recognition result.

example

txt = ocr(I,roi) recognizes text in I within one or more rectangular regions.

example

txt = ocr(ds) returns a cell array of ocrText objects, each containing the recognition results for the ROIs specified in the datastore, ds, for the corresponding image. Use this syntax to perform OCR on a collection of images. By default, the ocr function assumes that each ROI contains only a single line of text. To process ROIs that may contain multiple lines of text, set the LayoutAnalysis name-value argument to "block".

txt = ocr(___,Name=Value) specifies options using one or more name-value arguments in addition to any combination of arguments from previous syntaxes. For example, LayoutAnalysis="page" treats the image as a page containing blocks of text.

example

Examples

collapse all

Recognize Text Within an Image

Open Live Script

Load an image with text into workspace.

businessCard = imread("businessCard.png");
ocrResults = ocr(businessCard)

ocrResults = 
  ocrText with properties:

                      Text: '4 MathWorks:↵↵ ↵↵The MathWorks, Inc.↵↵-3 Apple Hill Drive↵Natick, MA 01760-2098↵USA↵↵www.mathworks.com↵↵ ↵↵'
    CharacterBoundingBoxes: [107×4 double]
      CharacterConfidences: [107×1 single]
                     Words: {16×1 cell}
         WordBoundingBoxes: [16×4 double]
           WordConfidences: [16×1 single]
                 TextLines: {8×1 cell}
     TextLineBoundingBoxes: [8×4 double]
       TextLineConfidences: [8×1 single]

     recognizedText = ocrResults.Text;    
     figure
     imshow(businessCard)
     text(600,150,recognizedText,BackgroundColor=[1 1 1]);

Figure contains an axes object. The hidden axes object contains 2 objects of type image, text.

Recognize Text in Regions of Interest (ROIs)

Open Live Script

Read an image into the workspace.

I = imread("handicapSign.jpg");

Define one or more rectangular regions of interest in which to recognize text within input image.

roi = [370 246 363 423];

Alternatively, you can use drawrectangle to select a region using a mouse.

For example,

figure;imshow(I)

roi = round(getPosition(drawrectangle))

Recognize text within the ROI.

ocrResults = ocr(I,roi);

Insert the recognized text into the original image. Display the image with the inserted recognized text.

Iocr = insertText(I,roi(1:2),ocrResults.Text,AnchorPoint="RightTop",FontSize=16);
figure
imshow(Iocr)

Figure contains an axes object. The hidden axes object contains an object of type image.

Recognize Digits from Seven-Segment Display

Open Live Script

Read an image containing a seven-segment display into the workspace.

I = imread("sevSegDisp.jpg");

Specify the ROI that contains the seven-segment display.

roi = [506 725 1418 626];

To recognize the digits from the seven-segment display, specify the Model argument as "seven-segment".

ocrResults = ocr(I,roi,Model="seven-segment");

Display the recognized digits and detection confidence.

fprintf("Recognized seven-segment digits: ""%s""\nDetection confidence: %0.4f",cell2mat(ocrResults.Words),ocrResults.WordConfidences)

Recognized seven-segment digits: "5405.9"
Detection confidence: 0.7948

Insert the recognized digits into the image.

Iocr = insertObjectAnnotation(I,"rectangle", ...
            ocrResults.WordBoundingBoxes,ocrResults.Words,LineWidth=5,FontSize=72);
figure
imshow(Iocr)

Figure contains an axes object. The hidden axes object contains an object of type image.

Display Bounding Boxes of Words and Recognition Confidences

Open Live Script

Read an image containing text into the workspace.

businessCard = imread("businessCard.png");
ocrResults = ocr(businessCard)

ocrResults = 
  ocrText with properties:

                      Text: '4 MathWorks:↵↵ ↵↵The MathWorks, Inc.↵↵-3 Apple Hill Drive↵Natick, MA 01760-2098↵USA↵↵www.mathworks.com↵↵ ↵↵'
    CharacterBoundingBoxes: [107×4 double]
      CharacterConfidences: [107×1 single]
                     Words: {16×1 cell}
         WordBoundingBoxes: [16×4 double]
           WordConfidences: [16×1 single]
                 TextLines: {8×1 cell}
     TextLineBoundingBoxes: [8×4 double]
       TextLineConfidences: [8×1 single]

Iocr = insertObjectAnnotation(businessCard,"rectangle", ...
                           ocrResults.WordBoundingBoxes, ...
                           ocrResults.WordConfidences);
figure
imshow(Iocr)

Figure contains an axes object. The hidden axes object contains an object of type image.

Find and Highlight Text in Image

Open Live Script

Load an image containing text into the workspace.

businessCard = imread("businessCard.png");
ocrResults = ocr(businessCard);
bboxes = locateText(ocrResults,"Math",IgnoreCase=true);
Iocr = insertShape(businessCard,"FilledRectangle",bboxes);
figure
imshow(Iocr)

Figure contains an axes object. The hidden axes object contains an object of type image.

Evaluate Accuracy of OCR Model

Open Live Script

This example shows how to evaluate the accuracy of an OCR model that can recognize seven-segment numerals on a dataset. The evaluation dataset contain images of energy meter displays that have seven-segment numerals in them.

Download and extract dataset.

datasetURL = "https://ssd.mathworks.com/supportfiles/vision/data/7SegmentImages.zip";
datasetZip = "7SegmentImages.zip";
if ~exist(datasetZip,"file")
    disp("Downloading evaluation dataset (" + datasetZip + " - 96 MB) ...");
    websave(datasetZip,datasetURL);
end

datasetFiles = unzip(datasetZip);

Load the evaluation ground truth.

ld = load("7SegmentGtruth.mat");
gTruth = ld.gTruth;

Create datastores that contain images, bounding boxes and text labels from the groundTruth object using the ocrTrainingData function with the label and attribute names used during labeling.

labelName = "Text";
attributeName = "Digits";
[imds,boxds,txtds] = ocrTrainingData(gTruth,labelName,attributeName);

Combine the datastores.

cds = combine(imds,boxds,txtds);

Run OCR on the evaluation dataset.

results = ocr(cds, Model="seven-segment");

Evaluate the OCR results against the ground truth.

metrics = evaluateOCR(results,cds);

Evaluating ocr results
----------------------
* Selected metrics: character error rate, word error rate.
* Processed 119 images.
* Finalizing... Done.
* Data set metrics:

    CharacterErrorRate    WordErrorRate
    __________________    _____________

         0.082195            0.19958

Display accuracy of the OCR model.

modelAccuracy = 100*(1-metrics.DataSetMetrics.CharacterErrorRate);
disp("Accuracy of the OCR model= " + modelAccuracy + "%")

Accuracy of the OCR model= 91.7805%

Input Arguments

collapse all

`I` — Input image
M-by-N-by-3 truecolor image | M-by-N 2-D grayscale image | M-by-N binary image

Input image, specified as an M-by-N-by-3 truecolor image, M-by-N grayscale, or M-by-N binary image. The input image must consist of real, nonsparse values. The function converts truecolor or grayscale input images into a binary image using Otsu's thresholding technique, before performing character recognition. For best OCR results, specify an image in which the height of a lowercase x, or comparable character, is greater than 20 pixels. To improve results, remove any text rotations from the horizontal or vertical axes of greater than 10 degrees in either direction. The input image must be less than 2^16 (32,768) pixels in width or height.

`roi` — Rectangular regions of interest
M-by-4 element matrix

Rectangular regions of interest, specified as an M-by-4 matrix. Each row specifies a region of interest within the input image in the form [x y width height], where [x y] specifies the upper-left corner location, and [width height] specifies the size of the rectangular region of interest, in pixels. Each rectangle must be fully contained within the input image I. Before the recognition process, the function uses Otsu’s thresholding technique to convert truecolor and grayscale input regions of interest to binary regions.

To obtain best results when using ocr to recognize seven-segment digits, specify an roi that encloses only the part of the image that contains seven-segment digits.

`ds` — Evaluation data
datastore object

Evaluation data, specified as a datastore that returns a cell array or a table when input to the read function. The datastore must return a cell array or a table on the read functions with at least these two columns:

1st Column — A cell vector of logicals, grayscale images, or RGB images.
2nd Column — A cell vector that contains numROIs-by-4 matrices of bounding boxes of the form [x y width height], which specify text locations within each image.

When using a datastore input, the LayoutAnalysis name-value argument value must be either "auto" or "none".

Name-Value Arguments

collapse all

Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.

Example: txt = ocr(I,LayoutAnalysis="page") treats the text in the image as a page containing blocks of text.

Before R2021a, use commas to separate each name and value, and enclose Name in quotes.

`LayoutAnalysis` — Type of layout analysis to perform for text segmentation
`"auto"` (default) | `"page"` | `"block"` | `"line"` | `"word"` | `"character"` | `"none"`

Type of layout analysis to perform for text segmentation, specified as one of these options:

LayoutAnalysis Value	Text Treatment
`"auto"`	If the `Model` argument value is `"seven-segment"`, treats the text in the image as though the `LayoutAnalysis` value is `"block"`. If you specify the `ds` input argument, treats the text in the image as though the `LayoutAnalysis` value is `"none"`. Otherwise, the function treats the text in the image as `"page"`.
`"page"`	Treats the text in the image as a page containing blocks of text.
`"block"`	Treats the text in the image as a single block of text.
`"line"`	Treats the text in the image as a single line of text.
`"word"`	Treats the text in the image as a single word of text.
`"character"`	Treats the text in the image as a single character.
`"none"`	Do not perform layout analysis. Best used for images with a single line of text.

You can use the LayoutAnalysis argument to determine the layout of the text within the input image. For example, you can set LayoutAnalysis to "page" to recognize text from a scanned document that contains a specific format, such as a double column. This setting preserves the reading order in the returned text.

If your input image contains a few regions of text, or the text is located in a cluttered scene, the ocr function can return poor quality results. If you get poor OCR results, try a different layout that better matches the text in your image. If the text is located in a cluttered scene, try specifying an ROI around the text in your image in addition to trying a different layout.

`Model` — Model to use for recognition
`"english"` (default) | `"japanese"` | `"seven-segment"` | `<"ModelName">-fast"` | character vector | string scalar | cell array of character vectors | string array

Model to use for recognition, specified as one of these options;

"english", "japanese", "seven-segment" — These specify the built-in models for detecting English text, Japanese text, or seven-segment digits, respectively.
Character vector or string scalar — Use this option to specify a custom model or one of the additional language models included in the OCR Language Data Files support package.
Cell array of character vectors or string array — Use this option to specify multiple models to use for detection simultaneously.

For faster performance using the built-in models (including any additional installed language models), you can append -fast to the language model string. For example, "english-fast", "japanese-fast", or "seven-segment-fast".

You can also install additional models or add a custom model. For details, see Install OCR Language Data Files.

Specifying multiple languages enables simultaneous recognition of all the selected languages. However, selecting more than one language can reduce the accuracy of the ocr function and increase its processing time. To specify any of the additional languages that are contained in the Install OCR Language Data Files package, specify them the same way as the built-in languages. You do not need to specify the path. For example, to specify Finnish text recognition:

txt = ocr(img,Model="finnish");

List of Support Package OCR Languages

To use your own custom model for a trainOCR output, you can use this syntax:

ocrResults = ocr(I,Model=ocrModel);

where ocrModel contains the full path of the model.

For deployment targets generated by MATLAB^® Coder™: The generated OCR executable and language model file folder must be colocated.

For English: C:/path/to/eng.traineddata
For Japanese: C:/path/to/jpn.traineddata
For Seven-segment: C:/path/to/seven_segment.traineddata
For custom data files: C:/path/to/customlang.traineddata
C:/path/ocr_app.exe

You can copy the English, Japanese and seven-segment trained data files from this folder:

fullfile(matlabroot,"toolbox","vision","visionutilities","tessdata_best");

`CharacterSet` — Character subset
`""` (default) | character vector | string scalar

Character subset, specified as a character vector or string scalar. By default, CharacterSet is set to the empty character vector "", which specifies the function to search for all characters in the language model specified by the Model name-value argument. You can set this value to a smaller set of known characters to constrain the classification process.

The ocr function selects the best match for detected text from the CharacterSet. Using deducible knowledge about the characters in the input image helps to improve text recognition accuracy. For example, if you set CharacterSet to all numeric digits, "0123456789", the function attempts to match each character to only digits. In this case, the ocr function can incorrectly recognize a non-digit character as a digit.

If you specify the Model as seven-segment, the ocr function uses the CharacterSet value "0123456789.:-".

Output Arguments

collapse all

`txt` — Recognized text and metrics
`ocrText` object | array of `ocrText` objects | cell array of `ocrText` objects

Recognized text and metrics, returned as an ocrText object, array of ocrText objects, or a cell array of ocrText objects. Each object contains the recognized text and the location of the recognized text for an input image, as well as metrics indicating the confidence of the results. Confidence values are in the range [0,1], representing a percent probability. The shape of the txt output depends on whether you specify an entire single image or single image with one ROI, a single image with multiple ROIs, or a datastore to the ocr function.

Tips

Optical character recognition (OCR) language data files provide pretrained language models for the Tesseract OCR engine, enabling accurate and efficient text extraction in various languages. These files are designed for integration with Computer Vision Toolbox™, allowing you to leverage advanced OCR capabilities across multiple languages. For step-by-step guidance on installing these language data files, enabling third-party language support, and using the pretrained models with the ocr function for multilingual text recognition, see Install OCR Language Data Files. For an overview of OCR workflows and basic usage, see Getting Started with OCR.

If your OCR results are not what you expect, try one or more of these options:

Increase the image size by 2– 4 times.
If the characters in the image are too close together or their edges are touching, use morphology to thin out the characters. Using morphology to thin out the characters helps create space between them.
Use binarization to check for non-uniform lighting issues. Use the graythresh and imbinarize functions to binarize the image. If the characters are not visible in the results of the binarization, then the image has a potential non-uniform lighting issue. Try top-hat filtering, using the imtophat function, or other techniques that deal with removing non-uniform illumination.
Use the roi argument to isolate the text. You can specify the roi manually or use text detection.
If your image looks like a natural scene containing words, such as a street scene, rather than a scanned document, try setting the LayoutAnalysis argument to either "Block" or "Word".
Ensure that the image contains dark text on a light background. To achieve this, you can binarize the image and invert it before passing it to the ocr function.

References

[1] Smith, Ray. An Overview of the Tesseract OCR Engine. In Ninth International Conference on Document Analysis and Recognition (ICDAR 2007), 629–33. IEEE, 2007. https://doi.org/10.1109/ICDAR.2007.4376991."

[2] Smith, R., D. Antonova, and D. Lee. Adapting the Tesseract Open Source OCR Engine for Multilingual OCR. Proceedings of the International Workshop on Multilingual OCR, (2009).

[3] R. Smith. Hybrid Page Layout Analysis via Tab-Stop Detection. Proceedings of the 10th international conference on document analysis and recognition. 2009.

Extended Capabilities

expand all

C/C++ Code Generation
Generate C and C++ code using MATLAB® Coder™.

Usage notes and limitations:

LayoutAnalysis, Model, and CharacterSet must be compile-time constants.
Generated code for this function requires that the target platform has Tesseract and Leptonica installed. By default, generated code uses precompiled, platform-specific shared libraries. To generate portable code without relying on precompiled libraries, configure MATLAB Coder as follows:
```
cfg = coder.config("lib");
cfg.TargetLang = "C++";
cfg.UsePrecompiledLibraries = "Avoid";
```
You can also generate optimized C/C++ code using Embedded Coder™, provided Tesseract and Leptonica are installed on the target platform.

GPU Code Generation
Generate CUDA® code for NVIDIA® GPUs using GPU Coder™.

Refer to the usage notes and limitations in the C/C++ Code Generation section. The same usage notes and limitations apply to GPU code generation.

Version History

Introduced in R2014a

expand all

R2024b: Generate optimized C/C++ code using Embedded Coder

Starting in R2024b, you can generate optimized C/C++ code for ARM processors using Embedded Coder.

R2023a: `Language` and `TextLayout` name-value arguments removed

The Language and TextLayout name-value arguments have been removed. Use the Model and LayoutAnalysis.

ocr

Syntax

Description

Examples

Recognize Text Within an Image

Recognize Text in Regions of Interest (ROIs)

Recognize Digits from Seven-Segment Display

Display Bounding Boxes of Words and Recognition Confidences

Find and Highlight Text in Image

Evaluate Accuracy of OCR Model

Input Arguments

`I` — Input image
M-by-N-by-3 truecolor image | M-by-N 2-D grayscale image | M-by-N binary image

`roi` — Rectangular regions of interest
M-by-4 element matrix

`ds` — Evaluation data
datastore object

Name-Value Arguments

`LayoutAnalysis` — Type of layout analysis to perform for text segmentation
`"auto"` (default) | `"page"` | `"block"` | `"line"` | `"word"` | `"character"` | `"none"`

`Model` — Model to use for recognition
`"english"` (default) | `"japanese"` | `"seven-segment"` | `<"ModelName">-fast"` | character vector | string scalar | cell array of character vectors | string array

`CharacterSet` — Character subset
`""` (default) | character vector | string scalar

Output Arguments

`txt` — Recognized text and metrics
`ocrText` object | array of `ocrText` objects | cell array of `ocrText` objects

Tips

References

Extended Capabilities

C/C++ Code Generation
Generate C and C++ code using MATLAB® Coder™.

GPU Code Generation
Generate CUDA® code for NVIDIA® GPUs using GPU Coder™.

Version History

R2024b: Generate optimized C/C++ code using Embedded Coder

R2023a: `Language` and `TextLayout` name-value arguments removed

See Also

Apps

Functions

Objects

Topics

ocr

Syntax

Description

Examples

Recognize Text Within an Image

Recognize Text in Regions of Interest (ROIs)

Recognize Digits from Seven-Segment Display

Display Bounding Boxes of Words and Recognition Confidences

Find and Highlight Text in Image

Evaluate Accuracy of OCR Model

Input Arguments

I — Input image M-by-N-by-3 truecolor image | M-by-N 2-D grayscale image | M-by-N binary image

roi — Rectangular regions of interest M-by-4 element matrix

ds — Evaluation data datastore object

Name-Value Arguments

LayoutAnalysis — Type of layout analysis to perform for text segmentation "auto" (default) | "page" | "block" | "line" | "word" | "character" | "none"

Model — Model to use for recognition "english" (default) | "japanese" | "seven-segment" | <"ModelName">-fast" | character vector | string scalar | cell array of character vectors | string array

CharacterSet — Character subset "" (default) | character vector | string scalar

Output Arguments

txt — Recognized text and metrics ocrText object | array of ocrText objects | cell array of ocrText objects

Tips

References

Extended Capabilities

C/C++ Code Generation Generate C and C++ code using MATLAB® Coder™.

GPU Code Generation Generate CUDA® code for NVIDIA® GPUs using GPU Coder™.

Version History

R2024b: Generate optimized C/C++ code using Embedded Coder

R2023a: Language and TextLayout name-value arguments removed

See Also

Apps

Functions

Objects

Topics

`I` — Input image
M-by-N-by-3 truecolor image | M-by-N 2-D grayscale image | M-by-N binary image

`roi` — Rectangular regions of interest
M-by-4 element matrix

`ds` — Evaluation data
datastore object

`LayoutAnalysis` — Type of layout analysis to perform for text segmentation
`"auto"` (default) | `"page"` | `"block"` | `"line"` | `"word"` | `"character"` | `"none"`

`Model` — Model to use for recognition
`"english"` (default) | `"japanese"` | `"seven-segment"` | `<"ModelName">-fast"` | character vector | string scalar | cell array of character vectors | string array

`CharacterSet` — Character subset
`""` (default) | character vector | string scalar

`txt` — Recognized text and metrics
`ocrText` object | array of `ocrText` objects | cell array of `ocrText` objects

C/C++ Code Generation
Generate C and C++ code using MATLAB® Coder™.

GPU Code Generation
Generate CUDA® code for NVIDIA® GPUs using GPU Coder™.

R2023a: `Language` and `TextLayout` name-value arguments removed