How can I extract the time length (in miliseconds) between two audio signals?

Question

Wade el 25 de Oct. de 2025

0
Enlazar

Enlace directo a esta pregunta

https://la.mathworks.com/matlabcentral/answers/2180859-how-can-i-extract-the-time-length-in-miliseconds-between-two-audio-signals

Comentada: Star Strider el 30 de Oct. de 2025

I have a psychology experiment paradigm which asks participants to give a verbal response immediately after they hear a beep sound. Participants may or may not respond to the beep, and their response could be quick or slow. I need to extract the time length between the end of the beep sound and the start of their verbal response. Such time length should be measured in miliseconds as the total time allowed for each response was 3 seconds (3000 ms). There are hundreds of trials so I would like to find a way to do the extraction automatically. How should I achieve this? Carload thanks to any suggestions!

2 comentarios
Mostrar NingunoOcultar Ninguno

dpb el 25 de Oct. de 2025

Which toolboxes do you have available to use?

Walter Roberson el 25 de Oct. de 2025

I recommend using the third-party Psychtoolbox for this kind of work.

Iniciar sesión para comentar.

Iniciar sesión para responder a esta pregunta.

Answer 1

Star Strider el 25 de Oct. de 2025

0
Enlazar

Enlace directo a esta respuesta

https://la.mathworks.com/matlabcentral/answers/2180859-how-can-i-extract-the-time-length-in-miliseconds-between-two-audio-signals#answer_1571414

Considering the nature of this problem, probably the best option is to estimate the signal envelops with the Signal Processing Toolbox envelope function (use the 'peak' option with an appropriate window), decide on a threshold, and measure the time the envelope crosses the threshold.

It may be necessary to use a filter to eliminate noise. If you are using the lowpass function (or any of its friends) for this, use the ImpulseResponse='iir' name-value pair for best results.

This approach as worked for me in the past.

It will probably be necessary to experiment to get the result you want.

13 comentarios
Mostrar 11 comentarios más antiguosOcultar 11 comentarios más antiguos

Star Strider el 26 de Oct. de 2025

Abrir en MATLAB Online

My pleasure!

I would not use the peak values, however you need to use the 'peak' option in your envelope call. If you want to know when the voice response begins, set a threshold and then determine the time the voice response envelope (I use the upper envelope here) first crosses that threshold.

Try something like this --

Fs = 44100; % Sampling Frequency (z)

L = 5;

t = linspace(0, Fs*L, Fs*L+1).'/Fs;

ts = seconds(t); % Time Vector ('duration' Here)

s = randn(size(t)) .* exp(-(t-2.2).^2*10); % Voice Response Signal

[et,eb] = envelope(s, 1000, 'peak'); % Use 'peak' Option

thrshld = 0.25; % Detection Threshold Value

tidx = find(diff(sign(et - thrshld))); % Approximate Indices of Threshold Crossing

idxrng = tidx(1)+[-1 0 1]; % Index Range For Interpolation

t_exact = interp1(et(idxrng), ts(idxrng), thrshld); % 'Exact" Value Of Upper Envelope Crossing Threshold Value

fprintf('\nResponse envelope crosses detection threshold level at %.3f seconds\n', seconds(t_exact))

Response envelope crosses detection threshold level at 1.690 seconds

figure

plot(ts, s, DisplayName='Response Signal')

hold on

plot(ts, [et eb], LineWidth=2, DisplayName="Envelope")

hold off

grid

xlabel("Time (s)")

ylabel("Amplitude")

yline(thrshld, '--k', "Detection Threshold", DisplayName='Detection Threshold')

xline(t_exact, '-.r', "Response Onset Time", DisplayName="Response Onset Time")

text(t_exact, 1.5, sprintf('%.3f s \\rightarrow',seconds(t_exact)), Horiz='right')

legend(Location='best')

●

.

Star Strider el 27 de Oct. de 2025

My pleasure!

I defined the threshold empirically here. There is usually some noise, even in a filtered signal, so the threshold needs to be greater than that value. Beyond that, the lowest value that gives the best results (the fastest time) would be best. I doubt that there is a mathematical way to determine the best threshold.
I do not fully understand your experiment. My code measures the time to voice response onset from the beginning of a specific record. It has no idea where the beeps are, so it simply returns the time to the voice response. (This is a simple example, and it could be made as comprehensive as necessary to give you the result you want.) If the beeps are recorded in the same record as the voice response, and all the beeps have the same frequency characteristics (ideally a single frequency), it would be relatively straightforward to separate them from the voice response and compute the times of the beeps and the time of the voice response separately. I would need representative data to explore this.
I do not have a sample of your signal, so I cannot determine the noise characteristics. I usually use a Fourier transform of a signal to design the filter cutoffs, and determine the sort of filter I want (usually lowpass or bandpass).

I do not consider any questions to be 'stupid'! I will do my best to answer any that you have.

.

Star Strider el 28 de Oct. de 2025

Abrir en MATLAB Online

sample-1.zip

Thank you for the file.

I am having a bit of a problem understanding the signal contents. There are three 250 Hz frequency ranges, beginning at the low end at about 500, 2000, and 2500 Hz, according to the ''pspectrum' spectrogram' plot. Since they minimally overlap, they can be filtered from each other relatively efficiently, and then timed appropriately. (I filtered and plotted them individually. The bandpass filter frequency cutoffs can easily be changed as necessary.) What are they, and what should I do with them?

UZ = unzip('sample-1.zip')

UZ = 1×1 cell array

{'sample-1.wav'}

[s,Fs] = audioread(UZ{1});

L = size(s,1)

L = 867153

t = linspace(0, L-1, L).'/Fs;

figure

plot(t, s(:,1), DisplayName='Left Channel')

hold on

% plot(t, s(:,2), DisplayName='Right Channel')

% plot(t, s(:,1)-s(:,2), DisplayName='Channel Difference')

hold off

grid

legend(Location='best')

[FTs1,Fv] = FFT1(s(:,1),t);

figure

plot(Fv, abs(FTs1)*2)

grid

xlabel('Frequency (Hz)')

ylabel('Magnitude')

xlim([0 6]*1E+3)

[p,f,tps] = pspectrum(s(:,1), Fs, 'spectrogram');

figure

surfc(tps,f,p, 'EdgeColor','none')

colormap(turbo)

colorbar

xlabel('Time (s)')

ylabel('Frequency (Hz)')

zlabel('Magnitude')

title('''pspectrum'' spectrogram')

ylim([0 3E+3])

view(0,90)

s500 = bandpass(s(:,1), [250 1000], Fs, ImpulseResponse='iir');

s2000 = bandpass(s(:,1), [1800 2200], Fs, ImpulseResponse='iir');

s2500 = bandpass(s(:,1), [2500 2750], Fs, ImpulseResponse='iir');

figure

tiledlayout(3,1)

nexttile

plot(t, s500)

grid

xlabel('Time (s)')

ylabel('250 - 1000 Hz')

nexttile

plot(t, s2000)

grid

xlabel('Time (s)')

ylabel('1000 - 2200 Hz')

nexttile

plot(t, s2500)

grid

xlabel('Time (s)')

ylabel('2500 - 2750 Hz')

sgtitle('Bandpass-Filtered s(:,1)')

% Fs = 44100; % Sampling Frequency (z)

% L = 5;

% t = linspace(0, Fs*L, Fs*L+1).'/Fs;

% ts = seconds(t); % Time Vector ('duration' Here)

% s = randn(size(t)) .* exp(-(t-2.2).^2*10); % Voice Response Signal

abs1 = abs(s(:,1));

% [et,eb] = envelope(abs(:,1), 1000, 'peak'); % Use 'peak' Option[et,eb] = envelope(abs(:,1), 1000, 'peak'); % Use 'peak' Option

[et,eb] = envelope(abs1, 1000, 'peak'); % Use 'peak' Option

thrshld = 0.15; % Detection Threshold Value

tidx = find(diff(sign(et - thrshld))); % Approximate Indices of Threshold Crossing

for k = 1:numel(tidx)-1

idxrng = max(tidx(k)-1,1) : min(tidx(k)+1,L); % Index Range For Interpolation

t_exact(k) = interp1(et(idxrng), t(idxrng), thrshld); % 'Exact" Value Of Upper Envelope Crossing Threshold Value

% fprintf('\nResponse envelope crosses detection threshold level at %.3f seconds\n', seconds(t_exact))

end

figure

plot(t, s, DisplayName='Response Signal')

hold on

plot(t, [et eb], LineWidth=2, DisplayName="Envelope")

hold off

grid

xlim([0 5])

xlabel("Time (s)")

ylabel("Amplitude")

yline(thrshld, '--k', "Detection Threshold", DisplayName='Detection Threshold')

% xline(t_exact, '-.r', "Response Onset Time", DisplayName="Response Onset Time")

% text(t_exact, 1.5, sprintf('%.3f s \\rightarrow',seconds(t_exact)), Horiz='right')

% legend(Location='best')

function [FTs1,Fv] = FFT1(s,t)

% One-Sided Numerical Fourier Transform

% Arguments:

% s: Signal Vector Or Matrix

% t: Associated Time Vector

t = t(:);

L = numel(t);

if size(s,2) == L

s = s.';

end

Fs = 1/mean(diff(t));

Fn = Fs/2;

NFFT = 2^nextpow2(L);

FTs = fft((s - mean(s)) .* hann(L).*ones(1,size(s,2)), NFFT)/sum(hann(L));

Fv = Fs*(0:(NFFT/2))/NFFT;

% Fv = linspace(0, 1, NFFT/2+1)*Fn;

Iv = 1:numel(Fv);

Fv = Fv(:);

FTs1 = FTs(Iv,:);

end

.

Star Strider el 29 de Oct. de 2025

Abrir en MATLAB Online

sample-1.zip

I do not believe there is a problem. The signals can easily be separated by filtering them, and that is a significant advantage.

This is the best I can do with your data. The code is unfortunately fragile because of the nature of the signals, and while it should work with other records, it may not, without some tweaking.

I am not ceretain what the data actually are, and what you want to do with them.

The start and stop times of the segments are in the tables, however only the start times are plotted.

Try this --

UZ = unzip('sample-1.zip')

UZ = 1×1 cell array

{'sample-1.wav'}

[s,Fs] = audioread(UZ{1});

L = size(s,1)

L = 867153

t = linspace(0, L-1, L).'/Fs;

figure

plot(t, s(:,1), DisplayName='Left Channel')

hold on

% plot(t, s(:,2), DisplayName='Right Channel')

% plot(t, s(:,1)-s(:,2), DisplayName='Channel Difference')

hold off

grid

legend(Location='best')

[FTs1,Fv] = FFT1(s(:,1),t);

figure

plot(Fv, abs(FTs1)*2)

grid

xlabel('Frequency (Hz)')

ylabel('Magnitude')

xlim([0 6]*1E+3)

[p,f,tps] = pspectrum(s(:,1), Fs, 'spectrogram');

figure

surfc(tps,f,p, 'EdgeColor','none')

colormap(turbo)

colorbar

xlabel('Time (s)')

ylabel('Frequency (Hz)')

zlabel('Magnitude')

title('''pspectrum'' spectrogram')

ylim([0 3E+3])

view(0,90)

s500 = bandpass(s(:,1), [250 750], Fs, ImpulseResponse='iir');

s2000 = bandpass(s(:,1), [1900 2100], Fs, ImpulseResponse='iir');

s2500 = bandpass(s(:,1), [2600 2700], Fs, ImpulseResponse='iir');

smtx = [s500 s2000 s2500];

figure

tiledlayout(3,1)

nexttile

plot(t, s500)

grid

xlabel('Time (s)')

ylabel('250 - 750 Hz')

nexttile

plot(t, s2000)

grid

xlabel('Time (s)')

ylabel('1900 - 2100 Hz')

nexttile

plot(t, s2500)

grid

xlabel('Time (s)')

ylabel('2600 - 2700 Hz')

sgtitle('Bandpass-Filtered s(:,1)')

●

ttlmtx = ["250 - 750 Hz", "1900 - 2100 Hz", "2600 - 2700 Hz"];

figure

tiledlayout(3,1)

for k1 = 1:size(smtx,2)

[et,eb] = envelope(smtx(:,k1), 4500, 'peak'); % Use 'peak' Option

thrshld = max(abs(smtx(:,k1)))*0.6; % Detection Threshold Value

tidx = find(diff(sign(et - thrshld))); % Approximate Indices of Threshold Crossing

% for k2 = 1:numel(tidx)-1

% idxrng = max(tidx(k2)-1,1) : min(tidx(k2)+1,L); % Index Range For Interpolation

% t_exact(k2,:) = interp1(et(idxrng), t(idxrng), thrshld) % 'Exact" Value Of Upper Envelope Crossing Threshold Value

% tseg = t(idxrng)

% % fprintf('\nResponse envelope crosses detection threshold level at %.3f seconds\n', seconds(t_exact))

% end

% disp(t_exact)

% t_exact2 = t_exact(1:floor(numel(t_exact)/2)*2)

% t_exactr = reshape(t_exact2.', 2, []).'

% Tss{k1} = array2table(t_exactr, VariableNames=["Segment Start","Segment End"])

tidx2 = tidx(1:2:end);

tidx2 = reshape(tidx, 2, []).';

dmt2 = 1./diff([0; tidx2(:,1)]);

Lv = isoutlier(dmt2,'movmedian',4); % Find & Eliminate 'Double Start' Entries

tidx2 = tidx2(~Lv,:);

sstimesr = t(tidx2);

Tss{k1} = array2table(sstimesr, VariableNames=["Segment Start","Segment End"]);

nexttile

plot(t, smtx(:,k1), DisplayName='Response Signal')

hold on

plot(t, [et eb], LineWidth=1.5, DisplayName="Envelope")

hold off

grid

% xlim([0 5])

xlabel("Time (s)")

ylabel("Amplitude")

title(ttlmtx(k1))

yline(thrshld, '--k', "Detection Threshold", DisplayName='Detection Threshold')

xline(sstimesr(:,1), '-m')

ylim(ylim+[-1 1])

end

Tss{:}

ans = 5×2 table

Segment Start Segment End _____________ ___________ 0.83998 0.98923 4.5582 4.7937 8.3503 8.5734 12.085 12.156 16.094 16.291

ans = 5×2 table

Segment Start Segment End _____________ ___________ 3.8804 4.2529 7.6504 8.0385 11.475 11.865 15.249 15.672 19.167 19.573

ans = 5×2 table

Segment Start Segment End _____________ ___________ 3.4014 3.5273 7.1484 7.2903 11.036 11.131 14.759 14.927 18.698 18.825

function [FTs1,Fv] = FFT1(s,t)

% One-Sided Numerical Fourier Transform

% Arguments:

% s: Signal Vector Or Matrix

% t: Associated Time Vector

t = t(:);

L = numel(t);

if size(s,2) == L

s = s.';

end

Fs = 1/mean(diff(t));

Fn = Fs/2;

NFFT = 2^nextpow2(L);

FTs = fft((s - mean(s)) .* hann(L).*ones(1,size(s,2)), NFFT)/sum(hann(L));

Fv = Fs*(0:(NFFT/2))/NFFT;

% Fv = linspace(0, 1, NFFT/2+1)*Fn;

Iv = 1:numel(Fv);

Fv = Fv(:);

FTs1 = FTs(Iv,:);

end

..

Star Strider el 29 de Oct. de 2025

(My computer crashed and it took a few minutes to get it back up. This is unusual for Ubuntu, so I have to see what caused it.)

There is no actual 'problem' with my code. It has to use a non-zero threshold ('Detection Threshold') to detect the onset of a signal segment, because of noise in the signal that it is not possible to eliminate completely. The detection threshold has to be low enough to detect the onset of a signal, and high enough to not detect noise as a false-positive. The 'Detection Threshold' is calculated from the signal characteristics for each signal, and has to be the same value for the entire signal in order to trust the results.

This is the same with the bandpass filters. It might be possible to narrow the passbands considerably, however that risks eliminating possibly necessary information from the filtered output.

This is the problem with real-world data -- it never behaves the way I want it to, so I can never produce the ideal result. I have done extensive biomedical signal processing, and noise and unwanted signal characteristics are always a problem. The best I can ever hope for is consistency, so that the derived data actually make some sense.

There is never an ideal solution to real-world problems. There are always compromises.

Wade el 30 de Oct. de 2025

Since you've separated the audio into three different signal fragments according to their frequency, I wonder if there is a way to play each of them so that I can doublecheck which is which.

Star Strider el 30 de Oct. de 2025

You would have to run my code to separate the signals, do the filtering, and then listen to each one separately.

This only works with Google Chrome with MATLAB Online (I will not use Google Chrome), so I ran it on my desktop instead.

This works --

wavfile = websave('sample-1.zip','https://www.mathworks.com/matlabcentral/answers/uploaded_files/1842543/sample-1.zip')

UZ = unzip(wavfile)

[s,Fs] = audioread(UZ{1});

L = size(s,1)

t = linspace(0, L-1, L).'/Fs;

s500 = bandpass(s(:,1), [250 750], Fs, ImpulseResponse='iir');

s2000 = bandpass(s(:,1), [1900 2100], Fs, ImpulseResponse='iir');

s2500 = bandpass(s(:,1), [2600 2700], Fs, ImpulseResponse='iir');

% sound(s500, Fs) % Voice

% sound(s2000, Fs) % Squeak

% sound(s2500, Fs) % Squeak

That should work as written. (I just tested it.) I commented -out the sound calls. When you run that, un-ciomment them one at a time to listen to that particular vector.

The two that I labelled 'Squeak' sound similar to me, although they are obviously different in the pspectrum 'spectrogram' plot (they are not much different in frequency). I do not recognize much in the 'Voice' vector.

I also experimented with several different ways of finding the envelope (using a lowpass filter) and of finding the beginning of the signal (finding the peak and then finding the last lowest value of the preceeding 10E+3 index range). None of those worked satisfactorally because of the noise in the signal.

These data are extremely difficult to work with, largely because I rarely work with speech signals, only with signals from various sorts of biomedical instrumentation.

.

Iniciar sesión para comentar.

Answer 2

Walter Roberson el 26 de Oct. de 2025

0
Enlazar

Enlace directo a esta respuesta

https://la.mathworks.com/matlabcentral/answers/2180859-how-can-i-extract-the-time-length-in-miliseconds-between-two-audio-signals#answer_1571422

You can use https://www.mathworks.com/help/audio/ref/detectspeech.html or https://www.mathworks.com/help/audio/ref/voiceactivitydetector-system-object.html

See also https://github.com/com-psy-lab/Silence-Detection and https://ieeexplore.ieee.org/document/9678476

1 comentario
Mostrar -1 comentarios más antiguosOcultar -1 comentarios más antiguos

Wade el 29 de Oct. de 2025

Hi Walter,

Thanks for the answer. Could you please show me how to use one of these functions please?

Iniciar sesión para comentar.

How can I extract the time length (in miliseconds) between two audio signals?

2 comentarios
Mostrar NingunoOcultar Ninguno

Respuestas (2)

13 comentarios
Mostrar 11 comentarios más antiguosOcultar 11 comentarios más antiguos

1 comentario
Mostrar -1 comentarios más antiguosOcultar -1 comentarios más antiguos

Ver también

Categorías

Etiquetas

Productos

Versión

Community Treasure Hunt

How can I extract the time length (in miliseconds) between two audio signals?

2 comentarios Mostrar NingunoOcultar Ninguno

Respuestas (2)

13 comentarios Mostrar 11 comentarios más antiguosOcultar 11 comentarios más antiguos

1 comentario Mostrar -1 comentarios más antiguosOcultar -1 comentarios más antiguos

Ver también

Categorías

Etiquetas

Productos

Versión

Community Treasure Hunt

2 comentarios
Mostrar NingunoOcultar Ninguno

13 comentarios
Mostrar 11 comentarios más antiguosOcultar 11 comentarios más antiguos

1 comentario
Mostrar -1 comentarios más antiguosOcultar -1 comentarios más antiguos