11 research outputs found

    Fundamentals of Speech Synthesis and Speech Recognition

    Full text link

    Lenition of |h| and glottal stop

    No full text

    Speech Driven 3-D Face Point Trajectory Synthesis Algorithm

    No full text
    This paper presents a novel algorithm which generates three-dimensional face point trajectories for a given speech file with or without its text. The proposed algorithm first employs an off-line training phase. In this phase, recorded face point trajectories along with their speech data and phonetic labels are used to generate phonetic codebooks. These codebooks consist of both acoustic and visual features. Acoustics are represented by line spectral frequencies (LSF), and face points are represented with their principal components (PC). During the synthesis stage, speech input is rated in terms of its similarity to the codebook entries. Based on the similarity, each codebook entry is assigned a weighting coefficient. If the phonetic information about the test speech is available, this is utilized in restricting the codebook search to only several codebook entries which are visually closest to the current phoneme (a visual phoneme similarity matrix is generated for this purpose). Then t..

    Voice Conversion By Codebook Mapping Of Line Spectral Frequencies And Excitation Spectrum

    No full text
    This paper presents a new scheme for developing a voice conversion system that modifies the utterance of a source speaker to sound like speech from a target speaker. We refer to the method as Speaker Transformation Algorithm using Segmental Codebooks (STASC). Two new methods are described to perform the transformation of vocal tract and glottal excitation characteristics across speakers. In addition, the source speaker's general prosodic characteristics are modified using time-scale and pitch-scale modification algorithms. Informal listening tests suggest that convincing voice conversion is achieved while maintaining high speech quality. The performance of the proposed system is also evaluated on a standard Gaussian mixture model based speaker identification system, and the results show that the transformed speech is assigned higher likelihood by the target speaker model when compared to the source model. 1 Introduction There has been a considerable amount of research effort directed..
    corecore