Contenido principal

matrixProfile

Compute matrix profile between all pairs of subsequences in a single-variable or multivariable time series

Since R2024b

Description

Return Matrix Profile

MP = matrixProfile(X,len) returns the matrix profile of the single-variable or multivariable time series X. The matrix profile is the vector of minimum z-normalized Euclidean distances between each subsequence of X with length len and its closest neighbor.

  • If X is a vector, then the software treats it as a single channel.

  • If X is a matrix, then the software computes the matrix profile independently for each column (multivariable solution).

matrixProfile provides two different algorithms for performing the computations.

  • The STAMP algorithm (scalable time series anytime matrix profile) supports anytime and parallel computation, and works with both single-variable and multivariable data sets. Anytime capability allows you to stop the algorithm before it completes and still obtain an acceptably accurate solution. This is especially useful when computing a complete solution requires a significant amount of time. The MaxIteration name-value argument determines when to stop the computation.

  • The STOMP algorithm (scalable time series ordered matrix profile) is approximately log2(n) times faster than the STAMP algorithm, and is useful for single-variable time series if you have a GPU and do not need anytime capability.

You can use the functions findDiscord and findMotif to find the locations of the top discords and motifs in MP.

[MP,MPI]=matrixProfile(X,len) also returns the matrix profile index vector MPI for the locations of the nearest neighbors to the subsequence.

example

[___] = matrixProfile(___,Name=Value) specifies options using one or more name-value arguments in addition to the arguments in previous syntaxes. For example, to use parallel processing, set UseParallel to true.

Plot Matrix Profile

matrixProfile(___) plots an interactive plot of the matrix profile. You can use this syntax with any of the previous input-argument combinations.

example

Examples

collapse all

Load the data, which consists of T1. T1 is a timetable containing armature current measurements of a degrading DC motor.

load matrix_profile_data T1

Specify the time series variable X to T1.MotorCurrent and the query segment length to 100.

X = T1.MotorCurrent;
len = 100;

Calculate the matrix profile.

[MP,MPI] = matrixProfile(X,len);

Plot the matrix profile.

matrixProfile(X,len)

Matrix Profile Plots. The Time-Series plot is on the top. Overlays of yellow and purple on the plotted data show the two top motif pairs and the discord. The Matrix Profile plot, which plots the distances, is in the middle. The Subsequences plot is on the bottom, and shows the subsequences for the top two motif pairs and the discord together.

The profile shows the two top motif pairs, or segments that agree best with their neighbors, occur at locations 6717 and 3119. These locations are consistent with minima in the matrix profile plot.

The profile also shows a single discord at location 9797. This subsequence visibly deviates from the motif subsequences for much of its length.

Use findDiscord to find more discords, which are the locations of segments with the furthest distances from their neighbors. Show the top four locations.

locs = findDiscord(MP,MPI);
toplocs = locs(1:4)
toplocs = 4×1

        9797
        9800
        9802
        9792

Show the corresponding distances.

topdist = MP(toplocs)
topdist = 4×1

    8.3894
    8.2062
    8.1517
    7.9777

Plot the findDiscord results.

figure
findDiscord(MP,MPI)

Figure contains an axes object. The axes object with title Matrix Profile, xlabel Time, ylabel Distance contains 2 objects of type line. One or more of the lines displays its values using only markers These objects represent Distance, Discord.

Discords that are close to each other are probably part of the same anomaly. You need to identify only one discord for such a segment. Improve the segment separation and limit the number of discords to 10.

findDiscord(MP,MPI,MinSeparation=40,MaxNumDiscords=10)

findDiscord Plots. The Time-Series plot is on the top. The Matrix Profile plot is in the middle. The Matrix Profile Discord plot is on the bottom, and now shows discrete discord instances.

The highest discord is at location 9797, as the original matrix profile showed. The plot also shows significant discords in other locations.

Input Arguments

collapse all

Time series to evaluate, specified as a numeric vector of length n or a numeric matrix containing multiple columns of length n. X must not have any missing data.

Length of query subsequence, specified as an integer. len must be less than time series length n.

Name-Value Arguments

collapse all

Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.

Example: MP = matrixProfile(X,len,UseParallel=true) results in parallel processing.

Length of the exclusion zone around the query sequence during matrix profile computations, specified as the number of data points to exclude. This argument prevents false matches with the query subsequence itself.

Method for handling query windows near the endpoints of x, specified as one of these options:

  • "discard" — Truncate the length of the output vectors MP and MPI to nlen + 1, where n is the length of X.

  • "fill" — Extend the length of MP and MPI to n by padding MP with len – 1 NaNs. The software sets the last len –1 elements of the vector I to the sequence (nlen + 2:n).

Maximum number of iterations for computing an upper bound on MP when Algorithm is specified as STAMP, specified as an integer. MaxIteration determines the duration of the computation when you use the anytime capability to stop the algorithm before it completes.

The default value is nlen+1, which runs the algorithm to completion.

Option to use the parallel pool to speed up computations, specified as false, which results in using serial computation, or true for parallel computation.

You can set this option to true only when you are using a CPU array, and not a GPU array.

Matrix profile algorithm to use, specified as "STAMP" or "STOMP".

  • The STAMP algorithm (scalable time series anytime matrix profile) supports anytime and parallel computation, and works with both single-variable and multivariable data sets. Anytime capability allows you to stop the algorithm before it completes and still obtain an acceptably accurate solution. The value of MaxIteration controls the algorithm stop time.

  • The STOMP algorithm (scalable time series ordered matrix profile) is approximately log2(n) times faster than the STAMP algorithm, and is useful for single-variable time series if you have a GPU and do not need anytime capability.

The algorithms you can use depends on your computational configuration and number of data variables.

Multicore CPUGPU ArrayMultivariable
UseParallel=1UseParallel=0Any CPU/GPU Config
STAMP onlySTOMP or STAMPSTAMP Only

Output Arguments

collapse all

Matrix profile containing the best-matching neighbors for each subsequence in time series X, where each subsequence has a length of len, returned as a numeric vector or matrix.

Best-matching neighbors are defined as the minimum of all z-normalized distances between the subsequence pairs.

  • If X is a vector, then the software treats it as a single channel when computing MP.

  • If X is a matrix, then the software computes the matrix profile independently for each column.

The length of MP is equal to n or nlen +1, depending on the setting for EndPoints.

The ExclusionZoneLength value prevents false matches with the query subsequences themselves as the algorithm goes through all possible query sequences in X.

You can use the functions findDiscord and findMotif to find the locations of the top discords and top motif pairs, respectively, in MP.

Matrix profile index vector for the location MPI(k) of the nearest neighbor of the subsequence of length len, starting at index k.

The distance from subsequence X(k:k+len-1) to subsequence X(MPI(k):MPI(k)+len-1) is the minimum distance possible.

References

[1] Yeh, Chin-Chia Michael, et al. “Matrix Profile I: All Pairs Similarity Joins for Time Series: A Unifying View That Includes Motifs, Discords and Shapelets.” 2016 IEEE 16th International Conference on Data Mining (ICDM), IEEE, 2016, pp. 1317–22. DOI.org (Crossref), https://doi.org/10.1109/ICDM.2016.0179.

[2] Zhu, Yan, et al. “Matrix Profile II: Exploiting a Novel Algorithm and GPUs to Break the One Hundred Million Barrier for Time Series Motifs and Joins.” 2016 IEEE 16th International Conference on Data Mining (ICDM), IEEE, 2016, pp. 739–48. DOI.org (Crossref), https://doi.org/10.1109/ICDM.2016.0085.

Extended Capabilities

expand all

Version History

Introduced in R2024b

expand all