[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Postdoc position at IRISA, Rennes, France

Dear list,

We are seeking to recruit a postdoctoral researcher on the statistical modeling of multichannel audio, applied to speaker segmentation and separation (full subject below). The successful candidate will work under the supervision of Drs. Guillaume Gravier and Emmanuel Vincent, in the METISS group at IRISA, which possesses a newly-equipped room dedicated to the exploration of future meeting environments.

Prospective candidates should have a background in multichannel signal processing or in speech processing and hold a PhD for less than one year or being about to obtain one. Informal enquiries may be made to Emmanuel Vincent (emmanuel.vincent@xxxxxxxx) or Guillaume Gravier (guillaume.gravier@xxxxxxxx).

This appointment is for 2 years, starting summer or fall 2007. Salary will be at 28000 euros per annum. Applications must be submitted online before march 31st at

Joint statistical modeling of spectral, temporal and spatial audio features, applied to speaker segmentation and separation

Most audio signals represent complex sound scenes consisting of several overlapping sources (speakers, natural sounds, musical instruments). These sources are usually located at different spatial positions and exhibit different spectro-temporal characteristics. The processing of such documents involves several challenging tasks, such as the separation, the segmentation and more generally the description of each source.

Existing description algorithms are mostly designed for one-microphone recordings and rely on statistical modeling of spectral features. Yet, in many application environments, multiple microphones are available thus providing valuable spatial information. Beamforming algorithms are then typically employed to determine at each instant the number of sources and their locations based on spatial features. These algorithms can improve the detection of overlapping sources. However their robustness decreases for small microphone arrays or with moving sources.

The goal of this project is to define a unified statistical modeling framework for the joint exploitation of spectral, temporal and spatial information in multichannel audio signals. Dynamic state-based models offer a promising approach for the description of some extracted spectral and spatial features as a function of some hidden states associated with different sources and positions. A first stage of the project could consist of extending the state-of-the-art one-microphone segmentation model developed in our lab (based on GMMs) by incorporating spatial features obtained from classical source localization and separation techniques (e.g. ICA, DUET, beamforming).

The proposed framework will be primarily applied to speaker segmentation and separation, which is the task of finding out the structure of a speech recording according to the question "who spoke when and where" and to extract the signal of each speaker. The results will be evaluated on meeting data recorded by small microphone arrays. Data from the NIST meeting evaluation will be used along with data recorded at our lab in a room dedicated to the exploration of future meeting environments.

Emmanuel Vincent
METISS Project
Campus de Beaulieu, 35042 Rennes cedex, France
Phone: +332 9984 7227 - Fax: +332 9984 7171
Web: http://www.irisa.fr/metiss/members/evincent/