[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Speech(or Phase) Reconstruction from Magnitude Spectrum


I'm not sure if it's already been mentioned, but this article
demonstrates the (relatively easy) conditions under which exact signal
reconstruction from the magnitude STFT is possible.  I think they also
give an algorithm:

 abstract="any signal can be reconstructed from magnitude of STFT if
     overlap >= half window length using linear equations based
     on |X|^2=FFT(autocorrelation). Interesting, but
     pedagogically dangerous because it obscures the more
     general but less efficient DCT reconstruction theorem.",
 author="S.~Hamid Nawab and Thomas F. Quatieri and Jae S. Lim",
 keywords="speech coding, digital signal processing",
 title="Signal Reconstruction from Short-Time Fourier Transform

Matt Flax wrote:

This topic is very signal processing, or DSP. You will find efficient
solutions by discussing this on the music-dsp e-mail list :

Yes you are correct. You do want to 'complexify' the magnitude only
signal. You are now going down a road which is well tread, let me
propose another approach ...

Rather then think about the instantaneous phase of the signal, consider
how the signal will be processed in sequential blocks .... how do you
combine blocks (windows) of processed signal ?
You may want to look into the standard overlap add technique and combine
it with your current direction.

Back to your topic .... and in a slightly different approach ...
This complexification can come in many standard forms. They
include minimum phase, maximum phase, zero phase and also mixed phase.
The 'phase' relates to how the signal energy is centered in the time domain.

Say you do a zero phase realisation, then the overall signal power will
fluctuate according to the STFT power in each Fourier block of data. So
if you keep your block resolution small enough, you should be able to
get a pretty good signal in the end .... this is in some way connected
to the question ... "What is the best sized window required to represent
speech ... ". The answer to that question must be, well, what do you
want to represent best ?!@# and can be quite a complex issue ...

I attach the opposite of what you want to do ... if you invert this one
line algorithm then you will find your answer !!! Pretend the signal in
the script is not in the time domain, but the frequency domain ...
in other words whatever domain you put into the signal, you get out of
the algorithm ... time -> time, frequency -> frequency, f(freq) ->
f(freq) and so on....

Be careful and remember some signals are energy and some signals are
power ... these are non-linearly related ... so step your
algorithm carefully from reading in the data to writing it out ...


MFFM Bit Stream :
Other Projects :


%# Copyright 2004 Matt Flax <flatmax@xxxxxxxx>
%# This file is a stand alone tool for generating a zero phase
%# signal from a complex time signal
%# It is free software; you can
%# redistribute it and/or modify
%# it under the terms of the GNU General Public License as published by
%# the Free Software Foundation; either version 2 of the License, or
%# (at your option) any later version.
%# This file is distributed in the hope that it will be useful,
%# but WITHOUT ANY WARRANTY; without even the implied warranty of
%# GNU General Public License for more details.
%# You have received a copy of the GNU General Public License
%# along with this file, if not then please refer to www.gnu.org
%# to gain access to the GNU GPL license.

function [rSig,cSig]=complexSigToRealSig(complexSig)

  %# converts complexSig to rSig, with zero vector cSig returned.
  %# This function is a zero phase implementation.