How to find the important features from many features of a MFFC speech sample??

Question

Daud el 20 de Sept. de 2012

0
Enlazar

Enlace directo a esta pregunta

https://la.mathworks.com/matlabcentral/answers/48633-how-to-find-the-important-features-from-many-features-of-a-mffc-speech-sample

i am doing a speech classification; which will classify four words: go; stop; left and right. i am using MFFC for feature extraction and Neural network for classifier.

But the problem i am facing is that the MFFC features matrix of a single sample is huge around 124x13 where 124 is the number of frames and 13 is the number of MFC coefficient.

if i bring it to a column matrix it will be 1612x1; which is huge;

so how can i reduced this matrix by finding only the most important features?

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Iniciar sesión para comentar.

Iniciar sesión para responder a esta pregunta.

Answer 1

Ilya el 20 de Sept. de 2012

0
Enlazar

Enlace directo a esta respuesta

https://la.mathworks.com/matlabcentral/answers/48633-how-to-find-the-important-features-from-many-features-of-a-mffc-speech-sample#answer_59492

Editada: Ilya el 20 de Sept. de 2012

I described feature ranking/selection tools available from Statistics Tlbx here: http://www.mathworks.com/matlabcentral/answers/33808-select-machine-learning-features-in-matlab

With the exception of sequentialfs, all these techniques are based on specific classification or regression algorithms. If you select features using ensembles of decision trees, for instance, there is no guarantee the selected set will be also optimal for your neural net. sequentialfs on the other hand is going to be quite slow for that many features.

Regularized discriminant analysis with thresholding (released in 12a) is a fast method suitable for data with thousands of predictors. Here is an example showing how it can be used for feature selection: http://www.mathworks.com/help/stats/discriminant-analysis.html#btaf5dv I am no expert on speech classification, but generally if you have 1612 features, you can often get good classification by a simple linear method (which is what discriminant analysis provides).

2 comentarios
Mostrar NingunoOcultar Ninguno

Daud el 21 de Sept. de 2012

thanks for ur suggestion; i am using matlab 2009a; can i use these features.

Ilya el 21 de Sept. de 2012

Editada: Ilya el 21 de Sept. de 2012

Do you know which command?

Also, Stats release notes would help: http://www.mathworks.com/help/stats/release-notes.html

Iniciar sesión para comentar.

Answer 2

Greg Heath el 21 de Sept. de 2012

0
Enlazar

Enlace directo a esta respuesta

https://la.mathworks.com/matlabcentral/answers/48633-how-to-find-the-important-features-from-many-features-of-a-mffc-speech-sample#answer_59592

For a c-class classifier, use a target matrix with columns of eye(c). Then input and target matrices will have the sizes

[ I N ] = size(x) % I = 13, N =124

[ O N ] = size(t) % O = c = 4

Neq = prod(size(t)) % Number of equations

[z meanx stdx ]= zscore(x); % Standardize

[zout iout] = find(abs(x) > tol) & Outlier check

% Decide what to do with outliers (keep, delete or trim). For convenience I will keep the same notation.

MSE00 = var(t,1,2) % Biased mean-squared-error reference

MSE00a = var(t,0,2) % Unbiased MSE ref. "a"adjusted for DOF lost when training and testing with the same data.

%To get a preliminary feel for the data, you can obtain a linear classifier using backslash and look at the size of the weights.

W = t/[ones(1,N) ; z];

Nw0 = numel(W) % = (I + 1)*O = Number of estimated weights

y0 = W*[ones(1,N) ; z]; % real valued output

e0 = t=y0; % error

MSE0 = sse(e0)/Neq % Biased mean square error

MSE0a = sse(e0)/(Neq-Nw0) % Unbiased MSE

R20 = 1-MSE0/MSE00 % Rsquared statistic

R2a = 1 -MSE0a/MSE00a % adjusted Rsquared

% Now that you have a good feel, you can quickly use STEPWISEFIT to select input variable subsets for models that are linear in the coefficients.

This should help until you get comfortable with the more complicated SEQUENTIALFS.

Hope this helps.

Thank you for officially accepting my answer.

Greg

2 comentarios
Mostrar NingunoOcultar Ninguno

Daud el 21 de Sept. de 2012

Are u taking N as number of samples?; if so then sorry to say u little bit misunderstood; because 124 is frame number and each frame contains 13 features; and that why i brought it to a column which yields 1612x1; which contains 1612 features for one sample. i have 67 sample each for 4 classes hence total sample 67x4=268;

hence for total 268 samples the input matrix is 1612x268.

Greg Heath el 23 de Sept. de 2012

Whoops! I did misread the problem size. It looks like you may have to use a lowpass filter to reduce the number of pixels before selecting a variable subset.

Sorry.

Greg

Iniciar sesión para comentar.

How to find the important features from many features of a MFFC speech sample??

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Respuestas (2)

2 comentarios
Mostrar NingunoOcultar Ninguno

2 comentarios
Mostrar NingunoOcultar Ninguno

Ver también

Categorías

Etiquetas

Productos

Community Treasure Hunt

How to find the important features from many features of a MFFC speech sample??

0 comentarios Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Respuestas (2)

2 comentarios Mostrar NingunoOcultar Ninguno

2 comentarios Mostrar NingunoOcultar Ninguno

Ver también

Categorías

Etiquetas

Productos

Community Treasure Hunt

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

2 comentarios
Mostrar NingunoOcultar Ninguno

2 comentarios
Mostrar NingunoOcultar Ninguno