Sound Classifier
Libraries:
Audio Toolbox /
Deep Learning
Description
The Sound Classifier block uses YAMNet to classify audio segments into sound classes described by the AudioSet ontology. The Sound Classifier block combines necessary audio preprocessing and YAMNet network inference. The block returns predicted sound labels, predicted scores from the sounds, and class labels for predicted scores.
Examples
Detect Music in Simulink Using YAMNet
Detect music using the Sound Classifier block in Simulink®.
Compare Sound Classifier block with Equivalent YAMNet blocks
Show that Sound Classifier block is equivalent to the cascade of YAMNet Preprocess block and YAMNet block.
Ports
Input
audioIn — Sound data
column vector
Sound data to classify, specified as a one-channel signal (column vector). If Sample rate of input signal (Hz) is 16e3, there are no restrictions on the input frame length. If Sample rate of input signal (Hz) is different from 16e3, then the input frame length must be a multiple of the decimation factor of the resampling operation that the block performs. If the input frame length does not satisfy this condition, the block throws an error message with information on the decimation factor.
Data Types: single
| double
Output
sound — Predicted sound label
enumerated scalar
Predicted sound label, returned as an enumerated scalar.
Data Types: enumerated
scores — Predicted activations or scores
vector
Predicted activation or score values for each supported sound label, returned as a 1-by-521 vector, where 521 is the number of classes in YAMNet.
Data Types: single
labels — Class labels for predicted scores
vector
Class labels for predicted scores, returned as a 1-by-521 vector.
Data Types: enumerated
Parameters
Sample rate of input signal (Hz) — Sample rate of input signal in Hz
16e3
(default) | positive scalar
Specify the sample rate of the input signal as a positive scalar in Hz. If the sample rate is different from 16e3, then the block resamples the signal to 16e3, which is the sample rate that YAMNet supports.
Data Types: single
| double
Overlap percentage (%) — Overlap percentage between consecutive mel spectrograms
50
(default) | [0 100)
Specify the overlap percentage between consecutive mel spectrograms as a scalar in the range [0 100).
Data Types: single
| double
Classification — Select to output sound classification
on
(default) | off
Enable the output port sound, which outputs the classified sound.
Predictions — Output all scores and associated labels
off
(default) | on
Enable the output ports scores and labels, which output all predicted scores and associated class labels.
Block Characteristics
Data Types |
|
Direct Feedthrough |
|
Multidimensional Signals |
|
Variable-Size Signals |
|
Zero-Crossing Detection |
|
Algorithms
The Sound Classifier block algorithm consists of two steps:
Preprocessing –– YAMNet specific preprocessing. Generates mel spectrograms.
Prediction –– Predicting the sounds, scores, and labels of the input signal using the YAMNet sound classification network.
Preprocessing
Cast audioIn to single and resample to 16 kHz.
Compute the one-sided short-time Fourier transform (STFT) using a 25 ms periodic Hann window (400 samples) with a 10 ms hop (160 samples) and a 512-point DFT.
Convert the complex spectral values to magnitude and discard phase information.
Pass the one-sided magnitude STFTs through a 64-band mel-spaced filter bank. Doing so converts the 257-length STFT vectors to 64-length vectors in the mel scale.
Convert the 64-length vectors to a log scale.
Buffer the vectors into outputs of size 96-by-64, where 96 is the number of 10 ms frames in each mel spectrogram and 64 is the number of mel bands. The overlap between consecutive 96-by-64 mel spectrograms is determined by the value of the Overlap percentage (%) parameter.
Prediction
These 96-by-64 spectrograms are passed to the YAMNet block. The YAMNet block has a maximum of three outputs:
sound: The label of the most likely sound. You get one "sound" for each 96-by-64 spectrogram input.
scores: 1-by-512 vectors, with a score value for each supported sound label.
labels: 1-by-521 vectors containing the sound labels.
Extended Capabilities
C/C++ Code Generation
Generate C and C++ code using Simulink® Coder™.
Usage notes and limitations:
To generate generic C code that does not depend on third-party libraries, in the Configuration Parameters > Code Generation general category, set the Language parameter to
C
.To generate C++ code, in the Configuration Parameters > Code Generation general category, set the Language parameter to
C++
. To specify the target library for code generation, in the Code Generation > Interface category, set the Target Library parameter. Setting this parameter toNone
generates generic C++ code that does not depend on third-party libraries.For a list of networks and layers supported for code generation, see Networks and Layers Supported for Code Generation (MATLAB Coder).
Version History
Introduced in R2021b
See Also
Apps
Blocks
Functions
Comando de MATLAB
Ha hecho clic en un enlace que corresponde a este comando de MATLAB:
Ejecute el comando introduciéndolo en la ventana de comandos de MATLAB. Los navegadores web no admiten comandos de MATLAB.
Select a Web Site
Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: .
You can also select a web site from the following list:
How to Get Best Site Performance
Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.
Americas
- América Latina (Español)
- Canada (English)
- United States (English)
Europe
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)