Re: 1. MP3 or AAC mixing in compressed/coded domain (2) (Dan Ellis )


Subject: Re: 1. MP3 or AAC mixing in compressed/coded domain (2)
From:    Dan Ellis  <dpwe@xxxxxxxx>
Date:    Tue, 3 Mar 2009 07:36:17 -0500
List-Archive:<http://lists.mcgill.ca/scripts/wa.exe?LIST=AUDITORY>

> I am asking about mixing directly in compressed format. > > some reasons to pursue that: > > 1. each time you uncompress edit and re-compress, you lose audio quality. You lose audio quality because the original signal has been quantized in its MP3 representation, which is equivalent to adding some random offsets to each sample. If you decode and re-encode, you potentially re-quantize to a different quantization level, adding still more random offsets. If you stay in the quantized domain, you can avoid that. But if you are actually changing the waveform (by adding signals together), you cannot avoid changing the quantization levels, so I think the quality loss is unavoidable - the damage was actually done when the original signal was quantized to MP3, and you can't undo that. > 2. it's take too long to re-encode mp3 file after mixing in pcm format, > which is unacceptable for some real-time (or close to real-time) > application. I'm not sure exactly how the compute time breaks down in an MP3 encoder, but the big thing that is slower in an encoder vs. a decoder is that it has to recompute the psychoacoustic masking and bit allocation. Again, if you change the actual subband signals, and if you wish to preserve good psychoacoustic masking, you will have to re-run these stages. Compressed-domain processing should allow you to avoid the frequency transforms (initial polyphase filterbank and subsequent MDCT), since the subband representation is still linear even after those initial stages. However, if you want to have the long/short MDCT window switching properly done for your new signal, you will need to redo the MDCT stage too. So I think the polyphase filter is the only part you can save without compromising encoding quality, which is probably less than a quarter of the total processing. If you're prepared to sacrifice audio quality, there is probably something hacky you could do that would be much quicker, like choosing each quantized subband from only one of the two signals depending on which had greater energy in that band, which would give you a kind of mixture. But you'd still run into trouble if they had different short/long MDCT windows in any particular frame. DAn.


This message came from the mail archive
http://www.auditory.org/postings/2009/
maintained by:
DAn Ellis <dpwe@ee.columbia.edu>
Electrical Engineering Dept., Columbia University