[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Speech through few channels
I am sorry I took so long to get around to it but I
finally had a chance to listen to the tape you sent of
the simulation of speech and of music heard through as
few as 4 channels. It is evident from listening to
them that the recognizability of the speech survives better
than that of the music.
The reason for this is fairly clear: the simulation destroys
the pitch information, but seems to retain gross spectral information in
the low channels and good timing information in the high ones. This
sort of information reduction hurts the speech a lot less than it does the
music. It is obvious that speech can be quite intelligible without pitch
information, as we can see in the case of whispered speech. However,
pitch is very important to most music. I imagine that if you changed
the relative importance of pitch in speech and music -- say by using Chinese,
in which tones are important, as the speech, and drumming as the music,
you might get results less favorable to speech. Another way to increase
the importance of pitch in speech would be to ask the listener to report
on the emotional content of the speech.
In any case, the conclusion I draw is not that speech recognition needs
less acoustic information than music recognition does, but rather that
the two domains depend more heavily on different kinds of information.
Presumably the recognition process, in a lot of different domains of sound,
is sensitive to the integrity of different acoustic features. Therefore
there is no single best way to reduce information: The best kind of
information reduction depends on the domain.