identify
Syntax
Description
Examples
Train Speaker Identification System
This example uses a 1.36 GB subset of the Common Voice data set from Mozilla [1]. The data set contains 48 kHz recordings of subjects speaking short sentences.
Download the data set if it doesn't already exist and unzip it into tempdir
.
downloadFolder = matlab.internal.examples.downloadSupportFile("audio","commonvoice.zip"); dataFolder = tempdir; unzip(downloadFolder,dataFolder);
Ingest the train set using audioDatastore
and, to speed up this example, keep only 20% of each of the speaker files.
trainTable = readtable(dataFolder + fullfile("commonvoice","train","train.tsv"),FileType="text",Delimiter="tab"); adsTrain = audioDatastore(append(fullfile(dataFolder,"commonvoice","train","clips",filesep),trainTable.path,".wav")); idx = splitlabels(trainTable.client_id,0.2); adsTrain = subset(adsTrain,idx{1}); trainLabels = trainTable.client_id(idx{1});
Ingest the validation set using audioDatastore
.
valTable = readtable(dataFolder + fullfile("commonvoice","validation","validation.tsv"),FileType="text",Delimiter="tab"); valLabels = valTable.client_id; adsVal = audioDatastore(append(fullfile(dataFolder,"commonvoice","validation","clips",filesep),valTable.path,".wav"));
Split the validation data set into enroll and test sets. Use two utterances for enrollment and the remaining for the test set. Also, exclude any speakers with less than 5 utterances. Generally, the more utterances you use for enrollment, the better the performance of the system. However, most practical applications are limited to a small set of enrollment utterances.
labelCounts = countlabels(valLabels); labelsToExclude = labelCounts.Label(labelCounts.Count<5); idxs = splitlabels(valLabels,2,Exclude=labelsToExclude); adsEnroll = subset(adsVal,idxs{1}); enrollLabels = valLabels(idxs{1}); adsTest = subset(adsVal,idxs{2}); testLabels = valLabels(idxs{2});
Create an i-vector system that accepts feature input.
fs = 48e3;
iv = ivectorSystem(SampleRate=fs,InputType="features");
Create an audioFeatureExtractor
object to extract the gammatone cepstral coefficients (GTCC), the delta GTCC, the delta-delta GTCC, and the pitch from 50 ms periodic Hann windows with 45 ms overlap.
afe = audioFeatureExtractor(... SampleRate=fs, ... Window=hann(round(0.05*fs),"periodic"), ... OverlapLength=round(0.045*fs), ... gtcc=true,gtccDelta=true,gtccDeltaDelta=true,pitch=true);
Extract features from the train and enroll datastores.
xTrain = extract(afe,adsTrain); xEnroll = extract(afe,adsEnroll);
Train both the extractor and classifier using the training set.
trainExtractor(iv,xTrain, ... UBMNumComponents=64, ... UBMNumIterations=5, ... TVSRank=32, ... TVSNumIterations=3);
Calculating standardization factors .....done. Training universal background model ........done. Training total variability space ......done. i-vector extractor training complete.
trainClassifier(iv,xTrain,trainLabels, ... NumEigenvectors=16, ... ... PLDANumDimensions=16, ... PLDANumIterations=5);
Extracting i-vectors ...done. Training projection matrix .....done. Training PLDA model ........done. i-vector classifier training complete.
To calibrate the system so that scores can be interpreted as a measure of confidence in a positive decision, use calibrate
.
calibrate(iv,xTrain,trainLabels)
Extracting i-vectors ...done. Calibrating CSS scorer ...done. Calibrating PLDA scorer ...done. Calibration complete.
Enroll the speakers from the enrollment set.
enroll(iv,xEnroll,enrollLabels)
Extracting i-vectors ...done. Enrolling i-vectors ...................done. Enrollment complete.
Evaluate the file-level prediction accuracy on the test set.
numCorrect = 0; reset(adsTest) for index = 1:numel(adsTest.Files) features = extract(afe,read(adsTest)); results = identify(iv,features); trueLabel = testLabels(index); predictedLabel = results.Label(1); isPredictionCorrect = trueLabel==predictedLabel; numCorrect = numCorrect + isPredictionCorrect; end display("File Accuracy: " + round(100*numCorrect/numel(adsTest.Files),2) + " (%)")
"File Accuracy: 97.92 (%)"
References
Input Arguments
ivs
— i-vector system
ivectorSystem
object
i-vector system, specified as an object of type ivectorSystem
.
data
— Data to identify
column vector | matrix
Data to identify, specified as a column vector representing a single-channel (mono) audio signal or a matrix of audio features.
If
InputType
is set to"audio"
when the i-vector system is created,data
must be a column vector with underlying typesingle
ordouble
.If
InputType
is set to"features"
when the i-vector system is created,data
must be a matrix with underlying typesingle
ordouble
. The matrix must consist of audio features where the number of features (columns) is locked the first timetrainExtractor
is called and the number of hops (rows) is variable-sized.
Data Types: single
| double
scorer
— Scoring algorithm
"plda"
| "css"
Scoring algorithm used by the i-vector system, specified as
"plda"
, which corresponds to probabilistic linear discriminant
analysis (PLDA), or "css"
, which corresponds to cosine similarity
score (CSS).
To use "plda"
, you must train the PLDA model using
trainClassifier
. If the PLDA model has been trained, then
scorer
defaults to "plda"
. Otherwise, the
scorer defaults to "css"
.
Data Types: char
| string
N
— Number of candidates
positive scalar
Number of candidates to return in tableOut
, specified as a
positive scalar.
Note
If you request a number of candidates greater than the number of
labels
enrolled in the i-vector system, then all candidates are
returned. If unspecified, the number of candidates defaults to the number of enrolled
labels
.
Data Types: single
| double
Output Arguments
tableOut
— Score table
table
Candidate labels and corresponding scores, returned as a table. The number of rows
of tableOut
is equal to N
, the number of
candidates. The candidates are sorted in order of confidence.
Data Types: table
Version History
Introduced in R2021aR2022a: identify
throws warning if scores are not calibrated
Starting in R2022a, the identify
function throws a warning if the
scores from the i-vector system are not calibrated. Use calibrate
to
calibrate the scores.
See Also
trainExtractor
| trainClassifier
| calibrate
| unenroll
| enroll
| detectionErrorTradeoff
| verify
| ivector
| info
| addInfoHeader
| release
| ivectorSystem
| speakerRecognition
MATLAB Command
You clicked a link that corresponds to this MATLAB command:
Run the command by entering it in the MATLAB Command Window. Web browsers do not support MATLAB commands.
Select a Web Site
Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: .
You can also select a web site from the following list
How to Get Best Site Performance
Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.
Americas
- América Latina (Español)
- Canada (English)
- United States (English)
Europe
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)
Asia Pacific
- Australia (English)
- India (English)
- New Zealand (English)
- 中国
- 日本Japanese (日本語)
- 한국Korean (한국어)