Your query is more important for the signal level, since noise fluctuates much less. Even 4-talker babble is quite steady compared to speech; you can see this for yourself with box-and-whisker plots of single talker and noise envelopes.
Often the SNR is (or was?) set in the free-field and so averaged over long tracts of speech. I'm sorry I don't have any references on hand.
This would also imply that SNR is calculated before pre-emphasis. I would argue that it is the long run SNR in the environment that counts after all. How you process sound to improve the signal presentation (hopefully) is up to you. I'm not sure of the need to average in small windows; it might be overkill since SNR fluctuation is natural and will even out in the long run anyway.
In my own research with cochlear implants, I generally determined the rms level over each full sentence. Just be aware that silences in each track (if they exist) lead to underestimating the signal level and hence biasing the SNR. So you may wish to use a very small threshold to exclude silences from the calculation.
From: AUDITORY - Research in Auditory Perception on behalf of Leonid Litvak
Sent: Thu 8/13/2009 5:57 AM
Subject: [AUDITORY] Question on defining S/N ratio in speech-in-noise testing
I have a question regarding definition of signal-to-noise ratio as it
applies to speech-in-noise testing, with speech material being sentences. On
a simple level, SNR is just level of the signal divided by the level of the
The signal is typically speech, so its level fluctuates over time. Do people
typically use the average signal level computed over the whole sentence,
average signal level computed in 100 ms windows, medium signal level,
maximum signal level, etc.?
The same question could go for the noise token as well.
I would very much appreciate references to papers that discuss these issues.
Finally, we are interested to apply these tests to cochlear implant
recipients that have a well-characterized pre-emphasis curve as part of
their processor. Should the pre-emphasis curve be taken into account when
computing S/N ratios? This is not an issue for spectrally-matched noises,
but may be an issue for non-matched noises.
Thank you very much!