ivector
Description
specifies additional options using name-value arguments. You can choose the hardware
resource for extracting i-vectors and whether to apply the projection matrix from
w
= ivector(ivs
,data
,Name,Value
)trainClassifier
.
Examples
Train Word Recognition System
An i-vector system consists of a trainable front end that learns how to extract i-vectors based on unlabeled data, and a trainable backend that learns how to classify i-vectors based on labeled data. In this example, you apply an i-vector system to the task of word recognition. First, evaluate the accuracy of the i-vector system using the classifiers included in a traditional i-vector system: probabilistic linear discriminant analysis (PLDA) and cosine similarity scoring (CSS). Next, evaluate the accuracy of the system if you replace the classifier with bidirectional long short-term memory (BiLSTM) network or a K-nearest neighbors classifier.
Create Training and Validation Sets
Download the Free Spoken Digit Dataset (FSDD) [1]. FSDD consists of short audio files with spoken digits (0-9).
loc = matlab.internal.examples.downloadSupportFile("audio","FSDD.zip"); unzip(loc,pwd)
Create an audioDatastore
to point to the recordings. Get the sample rate of the data set.
ads = audioDatastore(pwd,IncludeSubfolders=true); [~,adsInfo] = read(ads); fs = adsInfo.SampleRate;
The first element of the file names is the digit spoken in the file. Get the first element of the file names, convert them to categorical, and then set the Labels
property of the audioDatastore
.
[~,filenames] = cellfun(@(x)fileparts(x),ads.Files,UniformOutput=false); ads.Labels = categorical(string(cellfun(@(x)x(1),filenames)));
To split the datastore into a development set and a validation set, use splitEachLabel
. Allocate 80% of the data for development and the remaining 20% for validation.
[adsTrain,adsValidation] = splitEachLabel(ads,0.8);
Evaluate Traditional i-vector Backend Performance
Create an i-vector system that expects audio input at a sample rate of 8 kHz and does not perform speech detection.
wordRecognizer = ivectorSystem(DetectSpeech=false,SampleRate=fs)
wordRecognizer = ivectorSystem with properties: InputType: 'audio' SampleRate: 8000 DetectSpeech: 0 Verbose: 1 EnrolledLabels: [0×2 table]
Train the i-vector extractor using the data in the training set.
trainExtractor(wordRecognizer,adsTrain, ... UBMNumComponents=64, ... UBMNumIterations=5, ... ... TVSRank=32, ... TVSNumIterations=5);
Calculating standardization factors ....done. Training universal background model ........done. Training total variability space ........done. i-vector extractor training complete.
Train the i-vector classifier using the data in the training data set and the corresponding labels.
trainClassifier(wordRecognizer,adsTrain,adsTrain.Labels, ... NumEigenvectors=10, ... ... PLDANumDimensions=10, ... PLDANumIterations=5);
Extracting i-vectors ...done. Training projection matrix .....done. Training PLDA model ........done. i-vector classifier training complete.
Calibrate the scores output by wordRecognizer
so they can be interpreted as a measure of confidence in a positive decision. Enroll labels into the system using the entire training set.
calibrate(wordRecognizer,adsTrain,adsTrain.Labels)
Extracting i-vectors ...done. Calibrating CSS scorer ...done. Calibrating PLDA scorer ...done. Calibration complete.
enroll(wordRecognizer,adsTrain,adsTrain.Labels)
Extracting i-vectors ...done. Enrolling i-vectors .............done. Enrollment complete.
In a loop, read audio from the validation datastore, identify the most-likely word present according to the specified scorer, and save the prediction for analysis.
trueLabels = adsValidation.Labels; predictedLabels = trueLabels; reset(adsValidation) scorer = "plda"; for ii = 1:numel(trueLabels) audioIn = read(adsValidation); to = identify(wordRecognizer,audioIn,scorer); predictedLabels(ii) = to.Label(1); end
Display a confusion chart of the i-vector system's performance on the validation set.
figure(Units="normalized",Position=[0.2 0.2 0.5 0.5]) confusionchart(trueLabels,predictedLabels, ... ColumnSummary="column-normalized", ... RowSummary="row-normalized", ... Title=sprintf('Accuracy = %0.2f (%%)',100*mean(predictedLabels==trueLabels)))
Evaluate Deep Learning Backend Performance
Next, train a fully-connected network using i-vectors as input.
ivectorsTrain = (ivector(wordRecognizer,adsTrain))'; ivectorsValidation = (ivector(wordRecognizer,adsValidation))';
Define a fully connected network.
layers = [ ... featureInputLayer(size(ivectorsTrain,2),Normalization="none") fullyConnectedLayer(128) dropoutLayer(0.4) fullyConnectedLayer(256) dropoutLayer(0.4) fullyConnectedLayer(256) dropoutLayer(0.4) fullyConnectedLayer(128) dropoutLayer(0.4) fullyConnectedLayer(numel(unique(adsTrain.Labels))) softmaxLayer classificationLayer];
Define training parameters.
miniBatchSize = 256; validationFrequency = floor(numel(adsTrain.Labels)/miniBatchSize); options = trainingOptions("adam", ... MaxEpochs=10, ... MiniBatchSize=miniBatchSize, ... Plots="training-progress", ... Verbose=false, ... Shuffle="every-epoch", ... ValidationData={ivectorsValidation,adsValidation.Labels}, ... ValidationFrequency=validationFrequency);
Train the network.
net = trainNetwork(ivectorsTrain,adsTrain.Labels,layers,options);
Evaluate the performance of the deep learning backend using a confusion chart.
predictedLabels = classify(net,ivectorsValidation); trueLabels = adsValidation.Labels; figure(Units="normalized",Position=[0.2 0.2 0.5 0.5]) confusionchart(trueLabels,predictedLabels, ... ColumnSummary="column-normalized", ... RowSummary="row-normalized", ... Title=sprintf('Accuracy = %0.2f (%%)',100*mean(predictedLabels==trueLabels)))
Evaluate KNN Backend Performance
Train and evaluate i-vectors with a k-nearest neighbor (KNN) backend.
Use fitcknn
to train a KNN model.
classificationKNN = fitcknn(... ivectorsTrain, ... adsTrain.Labels, ... Distance="Euclidean", ... Exponent=[], ... NumNeighbors=10, ... DistanceWeight="SquaredInverse", ... Standardize=true, ... ClassNames=unique(adsTrain.Labels));
Evaluate the KNN backend.
predictedLabels = predict(classificationKNN,ivectorsValidation); trueLabels = adsValidation.Labels; figure(Units="normalized",Position=[0.2 0.2 0.5 0.5]) confusionchart(trueLabels,predictedLabels, ... ColumnSummary="column-normalized", ... RowSummary="row-normalized", ... Title=sprintf('Accuracy = %0.2f (%%)',100*mean(predictedLabels==trueLabels)))
References
[1] Jakobovski. "Jakobovski/Free-Spoken-Digit-Dataset." GitHub, May 30, 2019. https://github.com/Jakobovski/free-spoken-digit-dataset
.
Input Arguments
ivs
— i-vector system
ivectorSystem
object
i-vector system, specified as an object of type ivectorSystem
.
data
— Data to transform
column vector | cell array | audioDatastore
| signalDatastore
| TransformedDatastore
Data to transform, specified as a cell array or as an
audioDatastore
, signalDatastore
, or
TransformedDatastore
object.
If
InputType
is set to"audio"
when the i-vector system is created, specifydata
as one of these:A column vector with underlying type
single
ordouble
.A cell array of single-channel audio signals, each specified as a column vector with underlying type
single
ordouble
.An
audioDatastore
object or asignalDatastore
object that points to a data set of mono audio signals.A
TransformedDatastore
with an underlyingaudioDatastore
orsignalDatastore
that points to a data set of mono audio signals. The output from calls toread
from the transform datastore must be mono audio signals with underlying data typesingle
ordouble
.
If
InputType
is set to"features"
when the i-vector system is created, specifydata
as one of these:A matrix containing the audio features.
A cell array of matrices containing the audio features.
An
audioDatastore
,signalDatastore
, orTransformedDatastore
whose read function returns a feature matrix.
The feature matrices must consist of audio features with underlying type
single
ordouble
where the number of features (columns) is locked the first timetrainExtractor
is called and the number of hops (rows) is variable-sized. The number of features input in any subsequent calls to any of the object functions must be equal to the number of features used when callingtrainExtractor
.
Data Types: cell
| audioDatastore
| signalDatastore
Name-Value Arguments
Specify optional pairs of arguments as
Name1=Value1,...,NameN=ValueN
, where Name
is
the argument name and Value
is the corresponding value.
Name-value arguments must appear after other arguments, but the order of the
pairs does not matter.
Before R2021a, use commas to separate each name and value, and enclose
Name
in quotes.
Example: ivector(ivs,data,ApplyProjectionMatrix=false,ExecutionEnvironment="parallel")
ApplyProjectionMatrix
— Option to apply projection matrix
true
| false
Option to apply projection matrix, specified as a logical value. This argument
specifies whether to apply the linear discriminant analysis (LDA) and within-class
covariance normalization (WCCN) projection matrix determined using
trainClassifier
.
If the projection matrix was trained, then
ApplyProjectionMatrix
defaults totrue
.If the projection matrix was not trained, then
ApplyProjectionMatrix
defaults tofalse
and cannot be set totrue
.
Data Types: logical
ExecutionEnvironment
— Hardware resource for execution
"auto"
(default) | "cpu"
| "gpu"
| "multi-gpu"
| "parallel"
Hardware resource for execution, specified as one of these:
"auto"
— Use the GPU if it is available. Otherwise, use the CPU."cpu"
— Use the CPU."gpu"
— Use the GPU. This option requires Parallel Computing Toolbox™."multi-gpu"
— Use multiple GPUs on one machine, using a local parallel pool based on your default cluster profile. If there is no current parallel pool, the software starts a parallel pool with pool size equal to the number of available GPUs. This option requires Parallel Computing Toolbox."parallel"
— Use a local or remote parallel pool based on your default cluster profile. If there is no current parallel pool, the software starts one using the default cluster profile. If the pool has access to GPUs, then only workers with a unique GPU perform training computation. If the pool does not have GPUs, then the training takes place on all available CPU workers. This option requires Parallel Computing Toolbox.
Data Types: char
| string
DispatchInBackground
— Option to use prefetch queuing
false
(default) | true
Option to use prefetch queuing when reading from a datastore, specified as a logical value. This argument requires Parallel Computing Toolbox.
Data Types: logical
Output Arguments
w
— i-vectors
column vector | matrix
Extracted i-vectors, returned as a column vector or a matrix. The number of columns
of w
is equal to the number of input signals. The number of rows of
w
is the dimension of the i-vector.
Version History
Introduced in R2021a
See Also
trainExtractor
| trainClassifier
| calibrate
| enroll
| unenroll
| detectionErrorTradeoff
| verify
| identify
| info
| addInfoHeader
| release
| ivectorSystem
| speakerRecognition
MATLAB Command
You clicked a link that corresponds to this MATLAB command:
Run the command by entering it in the MATLAB Command Window. Web browsers do not support MATLAB commands.
Select a Web Site
Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: .
You can also select a web site from the following list
How to Get Best Site Performance
Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.
Americas
- América Latina (Español)
- Canada (English)
- United States (English)
Europe
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)
Asia Pacific
- Australia (English)
- India (English)
- New Zealand (English)
- 中国
- 日本Japanese (日本語)
- 한국Korean (한국어)