How could I compare the placement/timing of transients between two audio files?

8 visualizaciones (últimos 30 días)
A brief preface: I'm sorry if a post with a similar idea already exists on this site. I've been limited on how much time I've been able to put into this question, so I figured I would post this to put it out there and return to these forums once I'm finished with finals.
This idea comes from the intersection of my interests as a percussionist and as an undergrad studying computer and electrical engineering who's interested in signal analysis.
Defining a term
Within a drumline, it's said that the instrumentalists are playing "cleanly" when a line of several drummers are playing closely enough that they sound less like a collection of drummers, but only a couple (or - ideally - a single drummer). Inversely, a line of drummers that are attempting to play together, but are failing to do so, is said to be "dirty." Playing "cleanly" with another drummer on the same drum can feel/sound similar to double-bouncing someone on a trampoline, whereas playing "dirtily" won't feel/sound any different than normal playing. Often, the difference between the two states is only a few milliseconds. For the sake of over-defining what I mean, I'll give an example of a clean and a dirty snareline (both from the same corps so as to not single any corps out haha). You should only need to listen to the first few seconds of each sample to hear what I'm talking about.
An example of a clean snareline: https://youtu.be/__DqouywTTo?t=6
An example of a dirty-er snareline (it's hard to catch these guys playing poorly, though their entrance to this brief phrase isn't together): https://youtu.be/4cleiXKmx_A?t=206
What's my point?
While members of a drumline only need to use their ears to know if they're clean or not, the engineer in me is wondering if I can imperically determine the cleanliness of a drummer by comparing a recording of them playing a piece of music to a recording of the original drumline playing the same piece of music.
I know a few variables need to be acknowledged/removed for the analysis to be better-defined, so here are some assumptions that I've thought of:
  • The two recordings will be only of their relative, isolated groups of instruments, i.e. an individual playing a snare part would be compared to only a snareline playing the same part. Attempting to isolate the same snareline from a recording of a full drumline (including snares, tenors, basses, etc.) is beyond the scope of this particular problem.
  • The two recordings would already be processed in a DAW such that they're trimmed to similar lengths and roughly synchronized. Parsing through the recordings to find the sequences is beyond the scope of this particular problem.
  • The reference snareline recording would be of the snareline playing the relevant section not-dirtily.
  • The analysis should be purely of the relative loudness of the two recordings, as the acoustic signatures of the drums, environments, etc. cannot ever be the same.
  • Both recordings will be clear of any irrelevant transients (e.g. clapping, talking, random stick clicking, etc.). If there are any such transients, they should be discarded as outliers.
I imagine the output of this analysis would be a value, in milliseconds, representing the average, absolute distance between the attacks of the transients throughout each of the two recordings. And, as the exact signatures of the transients will definitely differ between the two recordings, I'd imagine the sample to be selected to represent each transient should be the loudest average sample in its attack (this is in opposition of choosing the first sample of each transient's attack, as there is a chance that, in a recording of a snareline, one drummer may play earlier than the rest of the line. Thus, the selected timing would be representative of the timing of the early drummer instead of the whole snareline).
It might also be useful to compare the relative differences in amplitude between the transients in both recordings (this would compare how similarly the players handle variations in dynamics), but I'm only interested in analyzing the timings of the transients for now.
If anyone has made it this far in the post, you have my sincere thanks. If you have any particular advice/ideas, your input would be much appreciated. If you have any questions, don't hesitate to ask. By no means am I asking for a reply with a finished project - I just need to be pointed to the correct resources.
Also, in case you're wondering, I promise you aren't doing my homework. This spawned from a random thought I had while learning an exercise.

Respuestas (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by