[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Impairment Factor Framework for Wide-band Speech Codecs

   This mail is intended for speech quality estimation experts, especially to the authors of [1] i.e. Moeller et. al. I shall be grateful if the authors or others reply in order to clarify my queries.

In the paper ([1]) the authors propose a framework for deriving equipment impairment for Wide-band speech codecs for the E-Model (G.107).

Specifically the authors extend the upper limit of "R" in a NB/WB context to 129 using equation (1). A few things are confusing. First off in figure 1 (a), for the same set of distrortion conditions, Rnb reaches around 130 where as Rnb/wb remains 100 (peak value). The authors argue that due to the MOS->R conversion formula, Rnb/wb would reach a maximum of 100, but how does Rnb reach 130? On the other hand they say in the text (pp. 1970 col. 2 last para) that Rnb/wb has to be stretched beyond a peak value of 100. And they do so by using eq (2) which is reproduced below:
    R= a.(e^(Rnb/wb/b - 1)

This was used from (1) in which L.H.S is Rnb.

To the best of my understanding it turns out that for a given set of network cnditions, Rnb is higher than Rnb/wb. This is apparent from the graphs as well and also equations (1) and (2). So for instance, in eq (1) Rnb is a function of Rnb/wb, and as we plug in the values on the RHS, it turns out that when Rnb/wb is 100, Rnb is 129. On the other hand the authors suggest that Rnb/wb be extended to 129 (i.e. 29%). This is rather confusing and sounds strange. I am sure it is just that I am having trouble grasping this. But probably this is because the description of the test pairs is not very clear to me (on pp. 1970 col 2 para 1). 

2) Given the relationship above holds, it turns out the Ie,wb values for both nb and wb codecs can be derived using this equation as described above. But it sounds that this holds true only in the case of mixed nb/wb ACR subjective tests as described in the paper. One of my concerns is that when an instrumental model like PESQ-wb is applied to mixed nb/wb data, this would not hold? A reason I can think of that although PESQ-WB may be able to evaluate both NB ( by down sampling/up sampling) and WB speech data, PESQ-WB itself has not been trained in a mixed nb/wb mode? Would this have an effect? Although the authors also quote in the beginning of the paper that "... judgements collected in a purely WB context do not significantly from those collected in a mixed NB/WB context" and this was based on observation made by Barriac et al. and Takahashi et al. 

3) Although it seems from the nature of the work (and as it has been hinted upon in the conclusions) that it is some precursor of a new wideband standard, I would be grateful to learn if that is actually the case.

4) Has there been any follow-up work done on these lines as well by some one.

I shall be grateful to anyone who may address these queries, most preferably the authors themselves. 

Adil Raja

P.S. [1] S. Moeller, A. Raake, N. Kitawaki, A. Takahashi, M. Waltermann, "Impairment Factor Framework for Wide-band Speech Codecs", IEEE Tran Audio, Speech and Language Processing, Nov, 2006.

Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around