[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]


Hello, everybody,

I believe that the "hardy perennial" pitch determination problem of
speech (and not only speech) signal processing has its final solution now.
The solution is very fundamental and general in nature (and, also, quite
and has nothing to do with doing FFT or correlation of a signal.
It is based on state-space embedding of a signal - a concept originally
introduced more than 20 years ago for analyzing nonlinear and chaotic time

I presented the new method at ICASSP 2002 in Orlando, Florida.

The ICASSP paper and the Matlab demo program are available from

It was also posted to Google comp.dsp and comp.speech.research groups.
To date, we had quite a large number of downloads of the paper and demo
from all over the world.
I believe that some people in this group have actually read the paper and
downloaded our demo too.

Our US patent application has somewhat more detailed description of the
new pitch determination method.
It can be accessed online at http://www.uspto.gov/patft
(Pub. No.: 20030088401)
You need not worry about the patent, if you use our method for non-commercial
purposes, such as research or education.

As far as software is concerned, you will see it in the near future,
possibly as an open-source C code downloadable from our www site.
We do not have the resources of AT&T or Microsoft, so our development
process is slow.
We also do not want to release insufficiently tested or sub-optimal
of otherwise extremely robust and general method, so you can wait for our
implementation or implement it yourself (in its very basic form our method
can be programmed using. e.g Matlab, in 20 minutes or so).

I would like to re-iterate once more that, in our opinion, all
conventional methods
of pitch detection (e.g. correlation-, spectrum- or cepstrum-based) are
not viable anymore:
they cannot even approach the level of robustness and generality
demonstrated by our new method.
It is only a matter of time (and the availability of quality software
until all those algorithms (ESPS get_f0 etc.) are replaced with a standard
algorithm implementing our method (or, more likely, with several different
implementations intended
for different purposes, e.g. low-bit-rate vocoders, tools for speech
analysis, etc.)

Best regards,

Dmitry Terez

SoundMath Technologies, LLC