Dear Laszlo,

I think this paper might contain something of what you're looking for ...

Indeed it is a well-known phenomenon that phone error rate is not
necessarily a good predictor of word error rate in ASR (although it is hard
to pin down definitive references).  This counterintuitive behaviour can
arise for a number of possible reasons, e.g. the algorithms that are used to
estimate model parameters will result in different optimisations depending
on whether the models are trained using phone-level annotation (which could
include phone boundary information) or word-level annotation (which would
not); the optimisation criteria are often based on goodness of fit rather
than the outcome of classification, and hence there can be differential
effects; there may be distributional differences between the set of
context-dependent phone models used to form word models and the
context-independent phone labels used to evaluate performance; phones are
not uniformly distributed across words.

I'm not sure whether more recent (discriminative) training schemes such as
MCE and MPE are more or less likely to exhibit this phenomenon.

> Dear List,
> I know that speech recognition is a bit off-topic here, but I don't know
> of a more proper place to ask this. A reviewer wrote to a paper of
> mine that "the fact that better phone recognition does not necessarily
> mean better word recognition is already known, and people have been
> talking about it very frequently. This should be made clear and perperly
> referenced in the paper". Unfortunately, I'm personally sure that I've
> never seen this written down, because it would have saved me a lot of
> work -- but, unfortunately, I had to learned it from my own failures,
> so I'm sure I won't be able to recall any references for this. I'm also
> unable to figure out how to turn this thing into a reasonable Google
> search term (actually, I've just managed to find a reference for just the
> opposite - that "better phone recognition undoubtedly leads to better word
> recognition"). So, if anyone can tell me any paper stating or showing
> results that "better phone recognition does not necessarily mean better
> word recognition", I would be very grateful.
> Thanks,
>                Laszlo Toth
>         Hungarian Academy of Sciences         *
>   Research Group on Artificial Intelligence   *   "Failure only begins
>      e-mail: tothl@xxxxxxxxxxxxxxx            *    when you stop trying"
