[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Spectral tilt and sharper peaks vs TIMIT


I've recently begun some speech recognition work and would really
appreciate some assistance.  If this is an inappropriate location
for beginner questions, alternate suggestions would be appreciated :)

My current problem is the seemingly large disparity between the
spectral characteristics of the TIMIT continuous speech corpus
and speech samples that I've recorded locally.

More specifically, the locally recorded speech seems to have more
power in the upper frequencies (> 2000 Hz) and *much* sharper frequency
peaks throughout the spectrum.  On the low end of the spectrum, the
locally recorded data has very sharp peaks at (what appears to be) the
fundamental pitch (F0) and it harmonics.  These are much less
prominent in the TIMIT corpus.

I reckon the spectral tilt could be due to differences in the
frequency response of my microphone (Shure RS130), pre-amp
(Rolls MP13), or other assorted components.  However, I am currently
at a loss to explain the "peakiness" of my data relative to
that in the TIMIT corpus.

Many thanks, Scott.