extractTextEmbeddings

Extract text embeddings from search text using CLIP network text encoder

Since R2026a

Syntax

textEmbeddings = extractTextEmbeddings(clip,text)

textEmbeddings = extractTextEmbeddings(clip,text,Name=Value)

Description

Add-On Required: This feature requires the Computer Vision Toolbox Model for OpenAI CLIP Network add-on.

textEmbeddings = extractTextEmbeddings(clip,text) extracts text embeddings from the search text text using the text encoder of a Contrastive Language-Image Pre-Training (CLIP) network, clip, by running a forward pass on the neural network.

Note

This functionality requires Deep Learning Toolbox™.

example

textEmbeddings = extractTextEmbeddings(clip,text,Name=Value) specifies options using one or more name-value arguments. For example, MiniBatchSize=32 limits the batch size to 32 text strings.

Examples

collapse all

Extract Text Embeddings Using CLIP Network Encoder

This example uses:

Open Live Script

Create a pretrained CLIP network with a ViT-B/16 backbone.

clip = clipNetwork("vit-b-16");

Define a text search term.

search = "A photo of a children's book.";

Extract the text embeddings for the search term using the CLIP network encoder.

textEmbeddings = extractTextEmbeddings(clip,search);

Display the size of the text embeddings.

size(textEmbeddings)

ans = 1×2

   512     1

Input Arguments

collapse all

`clip` — CLIP network
`clipNetwork` object

CLIP network, specified as a clipNetwork object.

`text` — Input text
B-element string array | datastore

Input text, specified as a B-element string array or a datastore containing B strings. B is the number of strings in the batch. You must specify the text in English using ASCII characters. The function automatically pads or truncates each text input so that it contains exactly 77 tokens.

Name-Value Arguments

collapse all

Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.

Example: extractTextEmbeddings(clip,text,MiniBatchSize=32) limits the batch size to 32 text strings.

`MiniBatchSize` — Number of strings in each batch
`64` (default) | positive integer

Number of strings in each batch, specified as a positive integer. Larger batch sizes reduce processing time, but require more memory.

`ExecutionEnvironment` — Hardware resource
`"auto"` (default) | `"gpu"` | `"cpu"`

Hardware resource on which to run the detector, specified as "auto", "gpu", or "cpu". The table shows the valid hardware resource values.

Resource	Action
`"auto"`	Use a GPU if it is available. Otherwise, use the CPU.
`"gpu"`	Use the GPU. To use a GPU, you must have Parallel Computing Toolbox™ and a CUDA^® enabled NVIDIA^® GPU. If a suitable GPU is not available, the function returns an error. For information about the supported compute capabilities, see GPU Computing Requirements (Parallel Computing Toolbox).
`"cpu"`	Use the CPU.

Output Arguments

collapse all

`textEmbeddings` — Text embeddings
512-by-B matrix | 768-by-B matrix

Text embeddings extracted from the CLIP model encoder, returned as a 512-by-B or 768-by-B matrix, depending on the value of the backbone argument of the clipNetwork object.

Image Encoder Backbone `backbone` Value	Text Embeddings Format
`"vit-b-16"`	512-by-B matrix
`"vit-l-14"`or `"resnet50"`	768-by-B matrix

Version History

Introduced in R2026a

extractTextEmbeddings

Syntax

Description

Examples

Extract Text Embeddings Using CLIP Network Encoder

Input Arguments

clip — CLIP network clipNetwork object

text — Input text B-element string array | datastore

Name-Value Arguments

MiniBatchSize — Number of strings in each batch 64 (default) | positive integer

ExecutionEnvironment — Hardware resource "auto" (default) | "gpu" | "cpu"

Output Arguments

textEmbeddings — Text embeddings 512-by-B matrix | 768-by-B matrix

Version History

See Also

`clip` — CLIP network
`clipNetwork` object

`text` — Input text
B-element string array | datastore

`MiniBatchSize` — Number of strings in each batch
`64` (default) | positive integer

`ExecutionEnvironment` — Hardware resource
`"auto"` (default) | `"gpu"` | `"cpu"`

`textEmbeddings` — Text embeddings
512-by-B matrix | 768-by-B matrix