Extract mel-frequency cepstral coefficients from audio
Audio Toolbox / Features
The MFCC block extracts feature vectors containing the mel-frequency cepstral coefficients (MFCCs), as well as their delta and delta-delta features, from the audio input signal. MFCCs are popular features extracted from speech signals for use in classification tasks.
Port_1 — Audio input
column vector | matrix
Audio input signal, specified as a column vector or a matrix. When you specify a matrix, the block treats columns as independent audio channels.
Port_1 — MFCC features
matrix | 3-D array
MFCC features returned as a matrix or 3-D array. The features include the MFCCs themselves and optionally include the delta and delta-delta features of the MFCCs. The dimensions of the output are L-by-M-by-N, where:
L is the number of feature vectors, which is specified by the Number of feature vectors parameter.
N is the number of channels in the input audio signal.
Trailing dimensions of size 1 are removed from the output.
Window — Analysis window
hamming(1024,'periodic') (default) | real vector
Analysis window applied to the input signal in the time domain, specified as a real vector.
Overlap length — Number of overlapping samples between adjacent windows
512 (default) | integer in the range [0,
Number of overlapping samples between adjacent windows, specified as an integer in
the range [0,
the length of the analysis window and is specified by the Window
Number of cepstral coefficients — Number of cepstral coefficients in each feature vector
13 (default) | positive integer greater than 1
Number of cepstral coefficients in each feature vector, specified as a positive integer greater than 1.
Rectification — Type of nonlinear rectification
Logarithm (default) |
Type of nonlinear rectification applied to the spectrum prior to the discrete cosine
transform, specified as
Append delta — Append delta of MFCCs to feature vectors
on (default) |
When you select this parameter, the block appends the delta of the MFCCs to the coefficients in each feature vector. The delta is an approximation of the first derivative of the MFCCs with respect to time. The number of delta features is equal to the number of MFCCs, which is specified by Number of cepstral coefficients.
Append delta-delta — Append delta-delta of MFCCs to feature vectors
on (default) |
When you select this parameter, the block appends the delta-delta of the MFCCs to each output feature vector. The delta-delta is an approximation of the second derivative of the MFCCs with respect to time. The number of delta-delta features is equal to the number of MFCCs, which is specified by Number of cepstral coefficients.
The block appends the delta-delta after the delta in the feature vectors if you also select the Append delta parameter.
Delta window length — Number of coefficients for calculating delta and delta-delta
9 (default) | odd integer greater than 2
Number of coefficients for calculating delta and delta-delta, specified as an odd integer greater than 2.
Number of feature vectors — Number of MFCC feature vectors in output
1 (default) | positive integer
Number of MFCC feature vectors in output, specified as a positive integer. The block buffers the output to return the specified number of feature vectors.
Number of overlapped feature vectors — Number of feature vectors overlapped in output
0 (default) | nonnegative integer
Number of feature vectors the block overlaps in the output, specified as a nonnegative integer less than Number of feature vectors.
Inherit sample rate from input — Specify source of input sample rate
off (default) |
When you select this parameter, the block inherits its sample rate from the input signal. When you clear this parameter, you specify the sample rate in the Input sample rate (Hz) parameter.
Input sample rate (Hz) — Sample rate of input
44.1e3 (default) | positive scalar
Input sample rate in Hz, specified as a positive scalar.
To enable this parameter, clear the Inherit sample rate from input parameter.
Number of bands — Number of bands in mel filter bank
32 (default) | positive integer
Number of bands in mel filter bank, specified as a positive integer.
Auto-determine frequency range — Automatically determine frequency range
on (default) |
When you select this parameter, the block sets the Frequency
fs is the
sample rate. The block determines the sample rate using the Inherit sample
rate from input and Input sample rate (Hz)
Frequency range (Hz) — Frequency range of mel filter bank
[0,22050] (default) | two-element row vector
Frequency range in Hz of mel filter bank, specified as a two-element row vector.
To enable this parameter, clear the Auto-determine frequency range parameter.
Filter bank design domain — Design domain of mel filter bank
linear (default) |
Design domain of mel filter bank, specified as
Filter bank normalization — Normalization technique for filter bank
bandwidth (default) |
Normalization technique that the block uses for the filter bank weights, specified
bandwidth–– Normalize the weights of each bandpass filter by the corresponding bandwidth of the filter.
area–– Normalize the weights of each bandpass filter by the corresponding area of the bandpass filter.
none–– The block does not normalize the weights of the filters.
Normalize window — Normalize analysis window
on (default) |
When you select this parameter, the block applies window normalization.
Spectrum type — Type of spectrum
power (default) |
Type of spectrum, specified as
Auto-determine FFT length — Automatically determine FFT length
on (default) |
When you select this parameter, the block automatically sets the FFT length to the window length. The window length is determined by the Window parameter.
FFT length — Number of DFT points
1024 (default) | positive integer
Number of points used to calculate the DFT, specified as a positive integer.
To enable this parameter, clear the Auto-determine FFT length parameter.
Mel-frequency cepstrum coefficients are popular features extracted from speech signals for use in recognition tasks. In the source-filter model of speech, cepstral coefficients are understood to represent the filter (vocal tract). The vocal tract frequency response is relatively smooth, whereas the source of voiced speech can be modeled as an impulse train. As a result, the vocal tract can be estimated by the spectral envelope of a speech segment.
The motivating idea of mel-frequency cepstral coefficients is to compress information about the vocal tract (smoothed spectrum) into a small number of coefficients based on an understanding of the cochlea. Although there is no hard standard for calculating the coefficients, the basic steps are outlined by the diagram.
The delta of an audio feature x is a least-squares approximation of the local slope of a region centered on sample x(k), which includes M samples before the current sample and M samples after the current sample.
The delta window length defines the length of the region from –M to M.
C/C++ Code Generation
Generate C and C++ code using Simulink® Coder™.
Introduced in R2022b