ČeskyEnglish
University of West Bohemia in Pilsen

Audio-visual speech synthesis

Investigator

Ing. Krňoul Zdeněk  <zdkrnoul@kky.zcu.cz>

Speech synthesis

Speech synthesis is imitating of human speech by a computer. It is composed from acoustic and visual speech synthesis. Joint audio-visual synthesis is known as talking head or TTAVS system. Acoustic speech synthesis is synthesis of speech component that can be heard, visual speech synthesis is synthesis of speech component that can be seen. Scheme of the syste, is depicted on Fig.1. The input of the system is the sequence of phonemes with prosodic information. The output of the system is audio-visual animation of speech.

TTAVS system
Figure 1: Scheme of a talking head system.

When we use 3D model of a head, obtained by reconstruction from a real head recordings, together with the imitated speech of the voice of the same person, we can call the TTAVS system also a virtual double.

Head
Figure 2: Photography of a real person and her computer virtual double.