matrixProfile

Compute matrix profile between all pairs of subsequences in a single-variable or multivariable time series

Since R2024b

collapse all in page

Syntax

MP = matrixProfile(X,len)

[MP,MPI]=matrixProfile(X,len)

[___] = matrixProfile(___,Name=Value)

matrixProfile(___)

Description

Return Matrix Profile

MP = matrixProfile(X,len) returns the matrix profile of the single-variable or multivariable time series X. The matrix profile is the vector of minimum z-normalized Euclidean distances between each subsequence of X with length len and its closest neighbor.

If X is a vector, then the software treats it as a single channel.
If X is a matrix, then the software computes the matrix profile independently for each column (multivariable solution).

matrixProfile provides two different algorithms for performing the computations.

The STAMP algorithm (scalable time series anytime matrix profile) supports anytime and parallel computation, and works with both single-variable and multivariable data sets. Anytime capability allows you to stop the algorithm before it completes and still obtain an acceptably accurate solution. This is especially useful when computing a complete solution requires a significant amount of time. The MaxIteration name-value argument determines when to stop the computation.
The STOMP algorithm (scalable time series ordered matrix profile) is approximately log2(n) times faster than the STAMP algorithm, and is useful for single-variable time series if you have a GPU and do not need anytime capability.

You can use the functions findDiscord and findMotif to find the locations of the top discords and motifs in MP.

[MP,MPI]=matrixProfile(X,len) also returns the matrix profile index vector MPI for the locations of the nearest neighbors to the subsequence.

example

[___] = matrixProfile(___,Name=Value) specifies options using one or more name-value arguments in addition to the arguments in previous syntaxes. For example, to use parallel processing, set UseParallel to true.

Plot Matrix Profile

matrixProfile(___) plots an interactive plot of the matrix profile. You can use this syntax with any of the previous input-argument combinations.

example

Examples

collapse all

Create Matrix Profile and Find Top Discords

Open Live Script

Load the data, which consists of T1. T1 is a timetable containing armature current measurements of a degrading DC motor.

load matrix_profile_data T1

Specify the time series variable X to T1.MotorCurrent and the query segment length to 100.

X = T1.MotorCurrent;
len = 100;

Calculate the matrix profile.

[MP,MPI] = matrixProfile(X,len);

Plot the matrix profile.

matrixProfile(X,len)

Matrix Profile Plots. The Time-Series plot is on the top. Overlays of yellow and purple on the plotted data show the two top motif pairs and the discord. The Matrix Profile plot, which plots the distances, is in the middle. The Subsequences plot is on the bottom, and shows the subsequences for the top two motif pairs and the discord together.

The profile shows the two top motif pairs, or segments that agree best with their neighbors, occur at locations 6717 and 3119. These locations are consistent with minima in the matrix profile plot.

The profile also shows a single discord at location 9797. This subsequence visibly deviates from the motif subsequences for much of its length.

Use findDiscord to find more discords, which are the locations of segments with the furthest distances from their neighbors. Show the top four locations.

locs = findDiscord(MP,MPI);
toplocs = locs(1:4)

toplocs = 4×1

        9797
        9800
        9802
        9792

Show the corresponding distances.

topdist = MP(toplocs)

topdist = 4×1

    8.3894
    8.2062
    8.1517
    7.9777

Plot the findDiscord results.

figure
findDiscord(MP,MPI)

Figure contains an axes object. The axes object with title Matrix Profile, xlabel Time, ylabel Distance contains 2 objects of type line. One or more of the lines displays its values using only markers These objects represent Distance, Discord.

Discords that are close to each other are probably part of the same anomaly. You need to identify only one discord for such a segment. Improve the segment separation and limit the number of discords to 10.

findDiscord(MP,MPI,MinSeparation=40,MaxNumDiscords=10)

findDiscord Plots. The Time-Series plot is on the top. The Matrix Profile plot is in the middle. The Matrix Profile Discord plot is on the bottom, and now shows discrete discord instances.

The highest discord is at location 9797, as the original matrix profile showed. The plot also shows significant discords in other locations.

Input Arguments

collapse all

`X` — Time series to evaluate
numeric vector | numeric matrix

Time series to evaluate, specified as a numeric vector of length n or a numeric matrix containing multiple columns of length n. X must not have any missing data.

`len` — Length of query subsequence
integer

Length of query subsequence, specified as an integer. len must be less than time series length n.

Name-Value Arguments

collapse all

Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.

Example: MP = matrixProfile(X,len,UseParallel=true) results in parallel processing.

`ExclusionZoneLength` — Length of exclusion zone
`ceil(len/2)` (default) | integer

Length of the exclusion zone around the query sequence during matrix profile computations, specified as the number of data points to exclude. This argument prevents false matches with the query subsequence itself.

`EndPoints` — Options for controlling output lengths
`"discard"` (default) | `"fill"`

Method for handling query windows near the endpoints of x, specified as one of these options:

"discard" — Truncate the length of the output vectors MP and MPI to n – len + 1, where n is the length of X.
"fill" — Extend the length of MP and MPI to n by padding MP with len – 1 NaNs. The software sets the last len –1 elements of the vector I to the sequence (n – len + 2:n).

`MaxIteration` — Maximum number of iterations when using the Anytime capability of the STAMP algorithm
n – `len`+1 (default) | integer

Maximum number of iterations for computing an upper bound on MP when Algorithm is specified as STAMP, specified as an integer. MaxIteration determines the duration of the computation when you use the anytime capability to stop the algorithm before it completes.

The default value is n – len+1, which runs the algorithm to completion.

`UseParallel` — Option to use parallel pool
`false` (default) | `true`

Option to use the parallel pool to speed up computations, specified as false, which results in using serial computation, or true for parallel computation.

You can set this option to true only when you are using a CPU array, and not a GPU array.

`Algorithm` — Matrix profile algorithm to use
`"STAMP"` (default) | `"STOMP"`

Matrix profile algorithm to use, specified as "STAMP" or "STOMP".

The STAMP algorithm (scalable time series anytime matrix profile) supports anytime and parallel computation, and works with both single-variable and multivariable data sets. Anytime capability allows you to stop the algorithm before it completes and still obtain an acceptably accurate solution. The value of MaxIteration controls the algorithm stop time.
The STOMP algorithm (scalable time series ordered matrix profile) is approximately log2(n) times faster than the STAMP algorithm, and is useful for single-variable time series if you have a GPU and do not need anytime capability.

The algorithms you can use depends on your computational configuration and number of data variables.

Multicore CPU	GPU Array	Multivariable
`UseParallel=1`	`UseParallel=0`	Any CPU/GPU Config
STAMP only	STOMP or STAMP	STAMP Only

Output Arguments

collapse all

`MP` — Matrix profile
numeric vector | numeric matrix

Matrix profile containing the best-matching neighbors for each subsequence in time series X, where each subsequence has a length of len, returned as a numeric vector or matrix.

Best-matching neighbors are defined as the minimum of all z-normalized distances between the subsequence pairs.

If X is a vector, then the software treats it as a single channel when computing MP.
If X is a matrix, then the software computes the matrix profile independently for each column.

The length of MP is equal to n or n – len +1, depending on the setting for EndPoints.

The ExclusionZoneLength value prevents false matches with the query subsequences themselves as the algorithm goes through all possible query sequences in X.

You can use the functions findDiscord and findMotif to find the locations of the top discords and top motif pairs, respectively, in MP.

`MPI` — Matrix profile index vector
integer vector | integer matrix

Matrix profile index vector for the location MPI(k) of the nearest neighbor of the subsequence of length len, starting at index k.

The distance from subsequence X(k:k+len-1) to subsequence X(MPI(k):MPI(k)+len-1) is the minimum distance possible.

References

[1] Yeh, Chin-Chia Michael, et al. “Matrix Profile I: All Pairs Similarity Joins for Time Series: A Unifying View That Includes Motifs, Discords and Shapelets.” 2016 IEEE 16th International Conference on Data Mining (ICDM), IEEE, 2016, pp. 1317–22. DOI.org (Crossref), https://doi.org/10.1109/ICDM.2016.0179.

[2] Zhu, Yan, et al. “Matrix Profile II: Exploiting a Novel Algorithm and GPUs to Break the One Hundred Million Barrier for Time Series Motifs and Joins.” 2016 IEEE 16th International Conference on Data Mining (ICDM), IEEE, 2016, pp. 739–48. DOI.org (Crossref), https://doi.org/10.1109/ICDM.2016.0085.

Extended Capabilities

expand all

Automatic Parallel Support
Accelerate code by automatically running computation in parallel using Parallel Computing Toolbox™.

To run in parallel, set the UseParallel name-value argument to true in the call to this function.

For more general information about parallel computing, see Run MATLAB Functions with Automatic Parallel Support (Parallel Computing Toolbox).

GPU Arrays
Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™.

The matrixProfile function fully supports GPU arrays. To run the function on a GPU, specify the input data as a gpuArray (Parallel Computing Toolbox). For more information, see Run MATLAB Functions on a GPU (Parallel Computing Toolbox).

Version History

Introduced in R2024b

expand all

R2025a: Capabilities added for GPU arrays and parallel processing

Extended Capabilities for GPU processing and parallel processing added

R2025a: Multivariable processing capability and new faster STOMP algorithm for single-variable processing

Ability to compute the matrix profile for multivariable time series added.

Option added to use the STOMP algorithm (scalable time series ordered matrix profile), which reduces the processing time for single-variable computations that use a GPU.

matrixProfile

Syntax

Description

Return Matrix Profile

Plot Matrix Profile

Examples

Create Matrix Profile and Find Top Discords

Input Arguments

X — Time series to evaluate numeric vector | numeric matrix

len — Length of query subsequence integer

Name-Value Arguments

ExclusionZoneLength — Length of exclusion zone ceil(len/2) (default) | integer

EndPoints — Options for controlling output lengths "discard" (default) | "fill"

MaxIteration — Maximum number of iterations when using the Anytime capability of the STAMP algorithm n – len+1 (default) | integer

UseParallel — Option to use parallel pool false (default) | true

Algorithm — Matrix profile algorithm to use "STAMP" (default) | "STOMP"

Output Arguments

MP — Matrix profile numeric vector | numeric matrix

MPI — Matrix profile index vector integer vector | integer matrix

References

Extended Capabilities

Automatic Parallel Support Accelerate code by automatically running computation in parallel using Parallel Computing Toolbox™.

GPU Arrays Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™.

Version History

R2025a: Capabilities added for GPU arrays and parallel processing

R2025a: Multivariable processing capability and new faster STOMP algorithm for single-variable processing

See Also

`X` — Time series to evaluate
numeric vector | numeric matrix

`len` — Length of query subsequence
integer

`ExclusionZoneLength` — Length of exclusion zone
`ceil(len/2)` (default) | integer

`EndPoints` — Options for controlling output lengths
`"discard"` (default) | `"fill"`

`MaxIteration` — Maximum number of iterations when using the Anytime capability of the STAMP algorithm
n – `len`+1 (default) | integer

`UseParallel` — Option to use parallel pool
`false` (default) | `true`

`Algorithm` — Matrix profile algorithm to use
`"STAMP"` (default) | `"STOMP"`

`MP` — Matrix profile
numeric vector | numeric matrix

`MPI` — Matrix profile index vector
integer vector | integer matrix

Automatic Parallel Support
Accelerate code by automatically running computation in parallel using Parallel Computing Toolbox™.

GPU Arrays
Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™.