How can I extract the time length (in miliseconds) between two audio signals?
Mostrar comentarios más antiguos
I have a psychology experiment paradigm which asks participants to give a verbal response immediately after they hear a beep sound. Participants may or may not respond to the beep, and their response could be quick or slow. I need to extract the time length between the end of the beep sound and the start of their verbal response. Such time length should be measured in miliseconds as the total time allowed for each response was 3 seconds (3000 ms). There are hundreds of trials so I would like to find a way to do the extraction automatically. How should I achieve this? Carload thanks to any suggestions!

2 comentarios
dpb
el 25 de Oct. de 2025
Which toolboxes do you have available to use?
Walter Roberson
el 25 de Oct. de 2025
I recommend using the third-party Psychtoolbox for this kind of work.
Respuestas (2)
Star Strider
el 25 de Oct. de 2025
0 votos
Considering the nature of this problem, probably the best option is to estimate the signal envelops with the Signal Processing Toolbox envelope function (use the 'peak' option with an appropriate window), decide on a threshold, and measure the time the envelope crosses the threshold.
It may be necessary to use a filter to eliminate noise. If you are using the lowpass function (or any of its friends) for this, use the ImpulseResponse='iir' name-value pair for best results.
This approach as worked for me in the past.
It will probably be necessary to experiment to get the result you want.
13 comentarios
My pleasure!
I would not use the peak values, however you need to use the 'peak' option in your envelope call. If you want to know when the voice response begins, set a threshold and then determine the time the voice response envelope (I use the upper envelope here) first crosses that threshold.
Try something like this --
Fs = 44100; % Sampling Frequency (z)
L = 5;
t = linspace(0, Fs*L, Fs*L+1).'/Fs;
ts = seconds(t); % Time Vector ('duration' Here)
s = randn(size(t)) .* exp(-(t-2.2).^2*10); % Voice Response Signal
[et,eb] = envelope(s, 1000, 'peak'); % Use 'peak' Option
thrshld = 0.25; % Detection Threshold Value
tidx = find(diff(sign(et - thrshld))); % Approximate Indices of Threshold Crossing
idxrng = tidx(1)+[-1 0 1]; % Index Range For Interpolation
t_exact = interp1(et(idxrng), ts(idxrng), thrshld); % 'Exact" Value Of Upper Envelope Crossing Threshold Value
fprintf('\nResponse envelope crosses detection threshold level at %.3f seconds\n', seconds(t_exact))
figure
plot(ts, s, DisplayName='Response Signal')
hold on
plot(ts, [et eb], LineWidth=2, DisplayName="Envelope")
hold off
grid
xlabel("Time (s)")
ylabel("Amplitude")
yline(thrshld, '--k', "Detection Threshold", DisplayName='Detection Threshold')
xline(t_exact, '-.r', "Response Onset Time", DisplayName="Response Onset Time")
text(t_exact, 1.5, sprintf('%.3f s \\rightarrow',seconds(t_exact)), Horiz='right')
legend(Location='best')
.
Wade
el 27 de Oct. de 2025
Star Strider
el 27 de Oct. de 2025
My pleasure!
- I defined the threshold empirically here. There is usually some noise, even in a filtered signal, so the threshold needs to be greater than that value. Beyond that, the lowest value that gives the best results (the fastest time) would be best. I doubt that there is a mathematical way to determine the best threshold.
- I do not fully understand your experiment. My code measures the time to voice response onset from the beginning of a specific record. It has no idea where the beeps are, so it simply returns the time to the voice response. (This is a simple example, and it could be made as comprehensive as necessary to give you the result you want.) If the beeps are recorded in the same record as the voice response, and all the beeps have the same frequency characteristics (ideally a single frequency), it would be relatively straightforward to separate them from the voice response and compute the times of the beeps and the time of the voice response separately. I would need representative data to explore this.
- I do not have a sample of your signal, so I cannot determine the noise characteristics. I usually use a Fourier transform of a signal to design the filter cutoffs, and determine the sort of filter I want (usually lowpass or bandpass).
I do not consider any questions to be 'stupid'! I will do my best to answer any that you have.
.
Wade
el 28 de Oct. de 2025
Thank you for the file.
I am having a bit of a problem understanding the signal contents. There are three 250 Hz frequency ranges, beginning at the low end at about 500, 2000, and 2500 Hz, according to the ''pspectrum' spectrogram' plot. Since they minimally overlap, they can be filtered from each other relatively efficiently, and then timed appropriately. (I filtered and plotted them individually. The bandpass filter frequency cutoffs can easily be changed as necessary.) What are they, and what should I do with them?
UZ = unzip('sample-1.zip')
[s,Fs] = audioread(UZ{1});
L = size(s,1)
t = linspace(0, L-1, L).'/Fs;
figure
plot(t, s(:,1), DisplayName='Left Channel')
hold on
% plot(t, s(:,2), DisplayName='Right Channel')
% plot(t, s(:,1)-s(:,2), DisplayName='Channel Difference')
hold off
grid
legend(Location='best')
[FTs1,Fv] = FFT1(s(:,1),t);
figure
plot(Fv, abs(FTs1)*2)
grid
xlabel('Frequency (Hz)')
ylabel('Magnitude')
xlim([0 6]*1E+3)
[p,f,tps] = pspectrum(s(:,1), Fs, 'spectrogram');
figure
surfc(tps,f,p, 'EdgeColor','none')
colormap(turbo)
colorbar
xlabel('Time (s)')
ylabel('Frequency (Hz)')
zlabel('Magnitude')
title('''pspectrum'' spectrogram')
ylim([0 3E+3])
view(0,90)
s500 = bandpass(s(:,1), [250 1000], Fs, ImpulseResponse='iir');
s2000 = bandpass(s(:,1), [1800 2200], Fs, ImpulseResponse='iir');
s2500 = bandpass(s(:,1), [2500 2750], Fs, ImpulseResponse='iir');
figure
tiledlayout(3,1)
nexttile
plot(t, s500)
grid
xlabel('Time (s)')
ylabel('250 - 1000 Hz')
nexttile
plot(t, s2000)
grid
xlabel('Time (s)')
ylabel('1000 - 2200 Hz')
nexttile
plot(t, s2500)
grid
xlabel('Time (s)')
ylabel('2500 - 2750 Hz')
sgtitle('Bandpass-Filtered s(:,1)')
% Fs = 44100; % Sampling Frequency (z)
% L = 5;
% t = linspace(0, Fs*L, Fs*L+1).'/Fs;
% ts = seconds(t); % Time Vector ('duration' Here)
% s = randn(size(t)) .* exp(-(t-2.2).^2*10); % Voice Response Signal
abs1 = abs(s(:,1));
% [et,eb] = envelope(abs(:,1), 1000, 'peak'); % Use 'peak' Option[et,eb] = envelope(abs(:,1), 1000, 'peak'); % Use 'peak' Option
[et,eb] = envelope(abs1, 1000, 'peak'); % Use 'peak' Option
thrshld = 0.15; % Detection Threshold Value
tidx = find(diff(sign(et - thrshld))); % Approximate Indices of Threshold Crossing
for k = 1:numel(tidx)-1
idxrng = max(tidx(k)-1,1) : min(tidx(k)+1,L); % Index Range For Interpolation
t_exact(k) = interp1(et(idxrng), t(idxrng), thrshld); % 'Exact" Value Of Upper Envelope Crossing Threshold Value
% fprintf('\nResponse envelope crosses detection threshold level at %.3f seconds\n', seconds(t_exact))
end
figure
plot(t, s, DisplayName='Response Signal')
hold on
plot(t, [et eb], LineWidth=2, DisplayName="Envelope")
hold off
grid
xlim([0 5])
xlabel("Time (s)")
ylabel("Amplitude")
yline(thrshld, '--k', "Detection Threshold", DisplayName='Detection Threshold')
% xline(t_exact, '-.r', "Response Onset Time", DisplayName="Response Onset Time")
% text(t_exact, 1.5, sprintf('%.3f s \\rightarrow',seconds(t_exact)), Horiz='right')
% legend(Location='best')
function [FTs1,Fv] = FFT1(s,t)
% One-Sided Numerical Fourier Transform
% Arguments:
% s: Signal Vector Or Matrix
% t: Associated Time Vector
t = t(:);
L = numel(t);
if size(s,2) == L
s = s.';
end
Fs = 1/mean(diff(t));
Fn = Fs/2;
NFFT = 2^nextpow2(L);
FTs = fft((s - mean(s)) .* hann(L).*ones(1,size(s,2)), NFFT)/sum(hann(L));
Fv = Fs*(0:(NFFT/2))/NFFT;
% Fv = linspace(0, 1, NFFT/2+1)*Fn;
Iv = 1:numel(Fv);
Fv = Fv(:);
FTs1 = FTs(Iv,:);
end
.
Wade
el 29 de Oct. de 2025
I do not believe there is a problem. The signals can easily be separated by filtering them, and that is a significant advantage.
This is the best I can do with your data. The code is unfortunately fragile because of the nature of the signals, and while it should work with other records, it may not, without some tweaking.
I am not ceretain what the data actually are, and what you want to do with them.
The start and stop times of the segments are in the tables, however only the start times are plotted.
Try this --
UZ = unzip('sample-1.zip')
[s,Fs] = audioread(UZ{1});
L = size(s,1)
t = linspace(0, L-1, L).'/Fs;
figure
plot(t, s(:,1), DisplayName='Left Channel')
hold on
% plot(t, s(:,2), DisplayName='Right Channel')
% plot(t, s(:,1)-s(:,2), DisplayName='Channel Difference')
hold off
grid
legend(Location='best')
[FTs1,Fv] = FFT1(s(:,1),t);
figure
plot(Fv, abs(FTs1)*2)
grid
xlabel('Frequency (Hz)')
ylabel('Magnitude')
xlim([0 6]*1E+3)
[p,f,tps] = pspectrum(s(:,1), Fs, 'spectrogram');
figure
surfc(tps,f,p, 'EdgeColor','none')
colormap(turbo)
colorbar
xlabel('Time (s)')
ylabel('Frequency (Hz)')
zlabel('Magnitude')
title('''pspectrum'' spectrogram')
ylim([0 3E+3])
view(0,90)
s500 = bandpass(s(:,1), [250 750], Fs, ImpulseResponse='iir');
s2000 = bandpass(s(:,1), [1900 2100], Fs, ImpulseResponse='iir');
s2500 = bandpass(s(:,1), [2600 2700], Fs, ImpulseResponse='iir');
smtx = [s500 s2000 s2500];
figure
tiledlayout(3,1)
nexttile
plot(t, s500)
grid
xlabel('Time (s)')
ylabel('250 - 750 Hz')
nexttile
plot(t, s2000)
grid
xlabel('Time (s)')
ylabel('1900 - 2100 Hz')
nexttile
plot(t, s2500)
grid
xlabel('Time (s)')
ylabel('2600 - 2700 Hz')
sgtitle('Bandpass-Filtered s(:,1)')
ttlmtx = ["250 - 750 Hz", "1900 - 2100 Hz", "2600 - 2700 Hz"];
figure
tiledlayout(3,1)
for k1 = 1:size(smtx,2)
[et,eb] = envelope(smtx(:,k1), 4500, 'peak'); % Use 'peak' Option
thrshld = max(abs(smtx(:,k1)))*0.6; % Detection Threshold Value
tidx = find(diff(sign(et - thrshld))); % Approximate Indices of Threshold Crossing
% for k2 = 1:numel(tidx)-1
% idxrng = max(tidx(k2)-1,1) : min(tidx(k2)+1,L); % Index Range For Interpolation
% t_exact(k2,:) = interp1(et(idxrng), t(idxrng), thrshld) % 'Exact" Value Of Upper Envelope Crossing Threshold Value
% tseg = t(idxrng)
% % fprintf('\nResponse envelope crosses detection threshold level at %.3f seconds\n', seconds(t_exact))
% end
% disp(t_exact)
% t_exact2 = t_exact(1:floor(numel(t_exact)/2)*2)
% t_exactr = reshape(t_exact2.', 2, []).'
% Tss{k1} = array2table(t_exactr, VariableNames=["Segment Start","Segment End"])
tidx2 = tidx(1:2:end);
tidx2 = reshape(tidx, 2, []).';
dmt2 = 1./diff([0; tidx2(:,1)]);
Lv = isoutlier(dmt2,'movmedian',4); % Find & Eliminate 'Double Start' Entries
tidx2 = tidx2(~Lv,:);
sstimesr = t(tidx2);
Tss{k1} = array2table(sstimesr, VariableNames=["Segment Start","Segment End"]);
nexttile
plot(t, smtx(:,k1), DisplayName='Response Signal')
hold on
plot(t, [et eb], LineWidth=1.5, DisplayName="Envelope")
hold off
grid
% xlim([0 5])
xlabel("Time (s)")
ylabel("Amplitude")
title(ttlmtx(k1))
yline(thrshld, '--k', "Detection Threshold", DisplayName='Detection Threshold')
xline(sstimesr(:,1), '-m')
ylim(ylim+[-1 1])
end
Tss{:}
function [FTs1,Fv] = FFT1(s,t)
% One-Sided Numerical Fourier Transform
% Arguments:
% s: Signal Vector Or Matrix
% t: Associated Time Vector
t = t(:);
L = numel(t);
if size(s,2) == L
s = s.';
end
Fs = 1/mean(diff(t));
Fn = Fs/2;
NFFT = 2^nextpow2(L);
FTs = fft((s - mean(s)) .* hann(L).*ones(1,size(s,2)), NFFT)/sum(hann(L));
Fv = Fs*(0:(NFFT/2))/NFFT;
% Fv = linspace(0, 1, NFFT/2+1)*Fn;
Iv = 1:numel(Fv);
Fv = Fv(:);
FTs1 = FTs(Iv,:);
end
..
Star Strider
el 29 de Oct. de 2025
If you want to extract the time difference, simply subtract one set of start times from another.
I am still not certain what differences you want to compute.
Wade
el 29 de Oct. de 2025
Star Strider
el 29 de Oct. de 2025
(My computer crashed and it took a few minutes to get it back up. This is unusual for Ubuntu, so I have to see what caused it.)
There is no actual 'problem' with my code. It has to use a non-zero threshold ('Detection Threshold') to detect the onset of a signal segment, because of noise in the signal that it is not possible to eliminate completely. The detection threshold has to be low enough to detect the onset of a signal, and high enough to not detect noise as a false-positive. The 'Detection Threshold' is calculated from the signal characteristics for each signal, and has to be the same value for the entire signal in order to trust the results.
This is the same with the bandpass filters. It might be possible to narrow the passbands considerably, however that risks eliminating possibly necessary information from the filtered output.
This is the problem with real-world data -- it never behaves the way I want it to, so I can never produce the ideal result. I have done extensive biomedical signal processing, and noise and unwanted signal characteristics are always a problem. The best I can ever hope for is consistency, so that the derived data actually make some sense.
There is never an ideal solution to real-world problems. There are always compromises.
Wade
el 30 de Oct. de 2025
Star Strider
el 30 de Oct. de 2025
You would have to run my code to separate the signals, do the filtering, and then listen to each one separately.
This only works with Google Chrome with MATLAB Online (I will not use Google Chrome), so I ran it on my desktop instead.
This works --
wavfile = websave('sample-1.zip','https://www.mathworks.com/matlabcentral/answers/uploaded_files/1842543/sample-1.zip')
UZ = unzip(wavfile)
[s,Fs] = audioread(UZ{1});
L = size(s,1)
t = linspace(0, L-1, L).'/Fs;
s500 = bandpass(s(:,1), [250 750], Fs, ImpulseResponse='iir');
s2000 = bandpass(s(:,1), [1900 2100], Fs, ImpulseResponse='iir');
s2500 = bandpass(s(:,1), [2600 2700], Fs, ImpulseResponse='iir');
% sound(s500, Fs) % Voice
% sound(s2000, Fs) % Squeak
% sound(s2500, Fs) % Squeak
That should work as written. (I just tested it.) I commented -out the sound calls. When you run that, un-ciomment them one at a time to listen to that particular vector.
The two that I labelled 'Squeak' sound similar to me, although they are obviously different in the pspectrum 'spectrogram' plot (they are not much different in frequency). I do not recognize much in the 'Voice' vector.
I also experimented with several different ways of finding the envelope (using a lowpass filter) and of finding the beginning of the signal (finding the peak and then finding the last lowest value of the preceeding 10E+3 index range). None of those worked satisfactorally because of the noise in the signal.
These data are extremely difficult to work with, largely because I rarely work with speech signals, only with signals from various sorts of biomedical instrumentation.
.
Categorías
Más información sobre Just for fun en Centro de ayuda y File Exchange.
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!











