Word recognition power frequency domain
3 views (last 30 days)
Leon Ellis on 11 Nov 2021
(Best to use this code in a livescript I think)
Hi, so my assignment is word recognition. As I understand we must take the sound signal to the power frequency domains and make use of their local maximums to identify the words through the mean square error method. So I've done this, I found the peaks x and y values and stored them in an array (Same with all other words I'm comparing the compare word to). I want to make use of the immse function to find the smallest error (Which will mean the word it's most likely to be) but the problem is, the saved matrixes have different lengths. So the amount of peaks in the power frequency domain is different for all my words. This means I can't apply this method to different words at all. I'm unaware of any other way to compute the meansquare error though or another way of computing the difference effectively.
The only important parts are actually at the end, where values are based on the final graph. (The circle values' x and y, which are on a 2xn matrix) is compared. Hz represents the matrix from the first word and Hz2 represents the data from the second word. Comp is the data from the word that needs to be compared to the other 2.
If anyone can hint to me an effective way or drop hints on how to use the power frequency domain to regognize what word has been said that would be much appreciated! Or if you're abe to help me be able to use the mean square method that would help too! The files will be attached. Thanks! (This code is a very simplified version of mine but consists of all the necessary parts to achieve what I'm trying).
My code is:
[CompareWord, Fs] = audioread("C:\Users\leone\OneDrive\Desktop\Year 2\Semester 2\EERI 222\Practical1\Sounds\CWord.wav");
%%Gets center x-coordinates of local maximum values.
%% Peak Values;
%%File names to load
%%Load .mat files with the 2xn matrices
Christopher McCausland on 11 Nov 2021
From what I can understand you are looking for the local maximums, however this returns vectors of varying lengths?
Have you considered constraining these vectors so you only take 'x' number of the most proment points, 'prominence' might also be a good indicator depending on what the signal looks like.