[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: speech/music


> Houtgast & Steeneken).  Rather, I think a lot of our uncanny ability to
> pick out speech from the most bizarrely-distorted signals comes from the
> very highly sophisticated speech-pattern decoders we have, which employ all
> of the many levels of constraints applicable to speech percepts (from
> low-level pitch continuity through to high level semantic prior
> likelihoods) to create and develop plausible speech hypotheses that can
> account for portions of the perceived auditory scene.

I have problems with this. (sorry)
I'm sure you must be able to detect the presence of speech independent of
being able to recognise it. If someone spoke to me in Finnish say, I would
be able to tell they were speaking (even in the presence of background
music/noise), even though I couldn't even segment the words, never mind
syntactically or semantically parse them.
I think there must be some way the brain splits up (deconvolves) the
signal before applying a speech recogniser.
(I have no proof of this of course, it's just a gut feeling)

I agree having a recogniser which would cope with speech would be the
ideal solution, but there is problems of training appropriate models to
recognise music you haven't seen before (the current HMM methods assume
the training data represents in some way the same distribution as the test
data), and from a time constraint, any removal of audio without relevant
information content before recognition is helpful.

I dont have the slightest idea of how the brain detects speech, but it
would seem logical to me that it can do that on a very low-level acoustic
basis. If this were true then in theory a front-end speech detector should
be possible.

I admit I know very little on this subject, so am looking forward to
people correcting me.

thanks for all your comments.