Recently, I started to look into extracting dynamic speech features from multiple frames(convolutive feature).
For example, as a input vector of HMM, the delta and delta-delta of mfcc is computed from cepstrum which is extracted from single frame.
Contrary to this approach, convolutive feature is trying to extract features from multiple frames at one time.
Firstly,I was searching recent papers using 'speech, dynamic, convolutive, multi frame, feature, etc' as a keyword.
But I couldn't find papers related to that field. It was hard to find out how much work was done on this field.
Would you please help me to find recent work(maybe a paper) with multi-frame features in speech?
I'd love to hear some advices from you.
Choong Hwan Choi
Computational NeuroSystems Laboratory
Department of BioSystems
Korea Advanced Institude of Science and Technology