Spectrogram Inversion Toolbox (Malcolm Slaney )


Subject: Spectrogram Inversion Toolbox
From:    Malcolm Slaney  <malcolm@xxxxxxxx>
Date:    Mon, 4 Aug 2014 13:58:03 -0700
List-Archive:<http://lists.mcgill.ca/scripts/wa.exe?LIST=AUDITORY>

This is a multipart message in MIME format. ------=_NextPart_000_0609_01CFAFEC.1CE84340 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit I'm happy to announce that the Spectrogram Inversion Toolbox is now available. This is Matlab code that finds a waveform that best fits a given spectrogram. You might ask why this is being announced on the auditory mailing list. The first time I needed this was when I was working on our correlogram inversion work. It's also a function included in the NSL (Neural Systems Laboratory) toolbox from Univ. of Maryland. (Although this new implementation is faster and better.) And it's useful if you are doing anything like audio morphing. The source code is online now at http://research.microsoft.com/en-US/downloads/5ee40a69-6bf1-43df-8ef4-3fb125 815856/default.aspx And more details are below. Enjoy. --- Malcolm The Spectrogram Inversion Toolbox allows one to create spectrograms from audio, and, more importantly, estimate the audio that generates any given spectrogram. This is useful because often one wants to think about, and modify sounds in the spectrogram domain. There are two big problems with spectrogram inversion: most importantly, one (generally) drops the phase when computing a spectrogram, and two not every (spectrogram) image corresponds to a valid waveform. This code finds the waveform that has a magnitude spectrogram most like the input spectrogram. The easy solution is to just do the inversion assuming some phase (like 0). Back in the time domain you get an answer, but there is a lot of destructive interference because the segments of adjacent frames do not have consistent phase. Some people advocate starting with a random phase. A better solution to this problem is to use an iterative algorithm proposed by Griffin and Lim many decades ago. It does converge, but slowly. An even better solution is to do the inversion, explicitly looking for a good set of phases. This toolbox does that, after the inverse Fourier transform of each slice, by finding the best time delay so the new frame and the summed frames to now are consistent. This is equivalent to starting with some arbitrary linear phase. The effect of this is to reduce the reconstruction error by an order of magnitude. Hurray. ------=_NextPart_000_0609_01CFAFEC.1CE84340 Content-Type: text/html; charset="us-ascii" Content-Transfer-Encoding: quoted-printable <html xmlns:v=3D"urn:schemas-microsoft-com:vml" = xmlns:o=3D"urn:schemas-microsoft-com:office:office" = xmlns:w=3D"urn:schemas-microsoft-com:office:word" = xmlns:m=3D"http://schemas.microsoft.com/office/2004/12/omml" = xmlns=3D"http://www.w3.org/TR/REC-html40"><head><meta = http-equiv=3DContent-Type content=3D"text/html; = charset=3Dus-ascii"><meta name=3DGenerator content=3D"Microsoft Word 15 = (filtered medium)"><style><!-- /* Font Definitions */ @xxxxxxxx {font-family:"Cambria Math"; panose-1:2 4 5 3 5 4 6 3 2 4;} @xxxxxxxx {font-family:Calibri; panose-1:2 15 5 2 2 2 4 3 2 4;} /* Style Definitions */ p.MsoNormal, li.MsoNormal, div.MsoNormal {margin:0in; margin-bottom:.0001pt; font-size:11.0pt; font-family:"Calibri","sans-serif";} a:link, span.MsoHyperlink {mso-style-priority:99; color:#0563C1; text-decoration:underline;} a:visited, span.MsoHyperlinkFollowed {mso-style-priority:99; color:#954F72; text-decoration:underline;} span.EmailStyle17 {mso-style-type:personal-compose; font-family:"Calibri","sans-serif"; color:windowtext;} .MsoChpDefault {mso-style-type:export-only;} @xxxxxxxx WordSection1 {size:8.5in 11.0in; margin:1.0in 1.0in 1.0in 1.0in;} div.WordSection1 {page:WordSection1;} --></style><!--[if gte mso 9]><xml> <o:shapedefaults v:ext=3D"edit" spidmax=3D"1026" /> </xml><![endif]--><!--[if gte mso 9]><xml> <o:shapelayout v:ext=3D"edit"> <o:idmap v:ext=3D"edit" data=3D"1" /> </o:shapelayout></xml><![endif]--></head><body lang=3DEN-US = link=3D"#0563C1" vlink=3D"#954F72"><div class=3DWordSection1><p = class=3DMsoNormal>I&#8217;m happy to announce that the Spectrogram = Inversion Toolbox is now available.&nbsp; This is Matlab code that finds = a waveform that best fits a given spectrogram.<o:p></o:p></p><p = class=3DMsoNormal><o:p>&nbsp;</o:p></p><p class=3DMsoNormal>You might = ask why this is being announced on the auditory mailing list.&nbsp; The = first time I needed this was when I was working on our correlogram = inversion work.&nbsp; It&#8217;s also a function included in the NSL = (Neural Systems Laboratory) toolbox from Univ. of Maryland.&nbsp; = (Although this new implementation is faster and better.)&nbsp; &nbsp;And = it&#8217;s useful if you are doing anything like audio = morphing.<o:p></o:p></p><p class=3DMsoNormal><o:p>&nbsp;</o:p></p><p = class=3DMsoNormal>The source code is online now at<o:p></o:p></p><p = class=3DMsoNormal>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&= nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <a = href=3D"http://research.microsoft.com/en-US/downloads/5ee40a69-6bf1-43df-= 8ef4-3fb125815856/default.aspx">http://research.microsoft.com/en-US/downl= oads/5ee40a69-6bf1-43df-8ef4-3fb125815856/default.aspx</a><o:p></o:p></p>= <p class=3DMsoNormal><o:p>&nbsp;</o:p></p><p class=3DMsoNormal>And more = details are below.&nbsp; Enjoy.<o:p></o:p></p><p = class=3DMsoNormal><o:p>&nbsp;</o:p></p><p class=3DMsoNormal>--- = Malcolm<o:p></o:p></p><p class=3DMsoNormal><o:p>&nbsp;</o:p></p><p = class=3DMsoNormal><o:p>&nbsp;</o:p></p><p = class=3DMsoNormal><o:p>&nbsp;</o:p></p><p = class=3DMsoNormal><o:p>&nbsp;</o:p></p><p class=3DMsoNormal>The = Spectrogram Inversion Toolbox allows one to create = spectrograms<o:p></o:p></p><p class=3DMsoNormal>from audio, and, more = importantly, estimate the audio that generates<o:p></o:p></p><p = class=3DMsoNormal>any given spectrogram.&nbsp; This is useful because = often one wants to<o:p></o:p></p><p class=3DMsoNormal>think about, and = modify sounds in the spectrogram domain.<o:p></o:p></p><p = class=3DMsoNormal><o:p>&nbsp;</o:p></p><p class=3DMsoNormal>There are = two big problems with spectrogram inversion: most = importantly,<o:p></o:p></p><p class=3DMsoNormal>one (generally) drops = the phase when computing a spectrogram, and two<o:p></o:p></p><p = class=3DMsoNormal>not every (spectrogram) image corresponds to a valid = waveform. This<o:p></o:p></p><p class=3DMsoNormal>code finds the = waveform that has a magnitude spectrogram most like the<o:p></o:p></p><p = class=3DMsoNormal>input spectrogram.<o:p></o:p></p><p = class=3DMsoNormal><o:p>&nbsp;</o:p></p><p class=3DMsoNormal>The easy = solution is to just do the inversion assuming some phase (like = 0).<o:p></o:p></p><p class=3DMsoNormal>Back in the time domain you get = an answer, but there is a lot of<o:p></o:p></p><p = class=3DMsoNormal>destructive interference because the segments of = adjacent frames do not<o:p></o:p></p><p class=3DMsoNormal>have = consistent phase. Some people advocate starting with a = random<o:p></o:p></p><p class=3DMsoNormal>phase.<o:p></o:p></p><p = class=3DMsoNormal><o:p>&nbsp;</o:p></p><p class=3DMsoNormal>A better = solution to this problem is to use an iterative = algorithm<o:p></o:p></p><p class=3DMsoNormal>proposed by Griffin and Lim = many decades ago.<span style=3D'font-family:"Arial","sans-serif"'> = </span>It does converge, but<o:p></o:p></p><p = class=3DMsoNormal>slowly.<o:p></o:p></p><p = class=3DMsoNormal><o:p>&nbsp;</o:p></p><p class=3DMsoNormal>An even = better solution is to do the inversion, explicitly = looking<o:p></o:p></p><p class=3DMsoNormal>for a good set of phases. = This toolbox does that, after the inverse<o:p></o:p></p><p = class=3DMsoNormal>Fourier transform of each slice, by finding the best = time delay so the<o:p></o:p></p><p class=3DMsoNormal>new frame and the = summed frames to now are consistent.&nbsp; This is = equivalent<o:p></o:p></p><p class=3DMsoNormal>to starting with some = arbitrary linear phase.&nbsp; The effect of this is to<o:p></o:p></p><p = class=3DMsoNormal>reduce the reconstruction error by an order of = magnitude. Hurray.<o:p></o:p></p></div></body></html> ------=_NextPart_000_0609_01CFAFEC.1CE84340--


This message came from the mail archive
http://www.auditory.org/postings/2014/
maintained by:
DAn Ellis <dpwe@ee.columbia.edu>
Electrical Engineering Dept., Columbia University