Speech communication with machines is a research area per se and includes speech recognition and synthesis, as well as speaker or language recognition. Large efforts have been devoted in that field, and much progress can be reported from the recent past. However, it appears that communication between humans usually involves different modalities, such as speech and face/lip ``reading.'' It results in a more robust communication, or may allow for a more natural and more efficient communication when speech is used together with gesture and vision. Bringing this multimodal communication ability to the field of human--machine communication is now a big challenge and raises difficult problems, such as the integration over time of speech and gesture stimuli, or the use of a common visual reference shared in a human--machine dialog. Another interesting aspect of the work in this area is the possibility of transferring the information from one modality to another, such as vision to speech, or speech to gesture, in order to help handicapped people to communicate better with the machine. Finally, it is thought that, in the long term, the study of multimodal training is necessary in order to develop a single communication mode system.