26 research outputs found

    An Investigation of Time-Frequency Representation Discriminators for High-Fidelity Vocoder

    Full text link
    Generative Adversarial Network (GAN) based vocoders are superior in both inference speed and synthesis quality when reconstructing an audible waveform from an acoustic representation. This study focuses on improving the discriminator for GAN-based vocoders. Most existing Time-Frequency Representation (TFR)-based discriminators are rooted in Short-Time Fourier Transform (STFT), which owns a constant Time-Frequency (TF) resolution, linearly scaled center frequencies, and a fixed decomposition basis, making it incompatible with signals like singing voices that require dynamic attention for different frequency bands and different time intervals. Motivated by that, we propose a Multi-Scale Sub-Band Constant-Q Transform CQT (MS-SB-CQT) discriminator and a Multi-Scale Temporal-Compressed Continuous Wavelet Transform CWT (MS-TC-CWT) discriminator. Both CQT and CWT have a dynamic TF resolution for different frequency bands. In contrast, CQT has a better modeling ability in pitch information, and CWT has a better modeling ability in short-time transients. Experiments conducted on both speech and singing voices confirm the effectiveness of our proposed discriminators. Moreover, the STFT, CQT, and CWT-based discriminators can be used jointly for better performance. The proposed discriminators can boost the synthesis quality of various state-of-the-art GAN-based vocoders, including HiFi-GAN, BigVGAN, and APNet.Comment: arXiv admin note: text overlap with arXiv:2311.1495

    SponTTS: modeling and transferring spontaneous style for TTS

    Full text link
    Spontaneous speaking style exhibits notable differences from other speaking styles due to various spontaneous phenomena (e.g., filled pauses, prolongation) and substantial prosody variation (e.g., diverse pitch and duration variation, occasional non-verbal speech like a smile), posing challenges to modeling and prediction of spontaneous style. Moreover, the limitation of high-quality spontaneous data constrains spontaneous speech generation for speakers without spontaneous data. To address these problems, we propose SponTTS, a two-stage approach based on neural bottleneck (BN) features to model and transfer spontaneous style for TTS. In the first stage, we adopt a Conditional Variational Autoencoder (CVAE) to capture spontaneous prosody from a BN feature and involve the spontaneous phenomena by the constraint of spontaneous phenomena embedding prediction loss. Besides, we introduce a flow-based predictor to predict a latent spontaneous style representation from the text, which enriches the prosody and context-specific spontaneous phenomena during inference. In the second stage, we adopt a VITS-like module to transfer the spontaneous style learned in the first stage to the target speakers. Experiments demonstrate that SponTTS is effective in modeling spontaneous style and transferring the style to the target speakers, generating spontaneous speech with high naturalness, expressiveness, and speaker similarity. The zero-shot spontaneous style TTS test further verifies the generalization and robustness of SponTTS in generating spontaneous speech for unseen speakers.Comment: 5 pages, 3 figures, Accepted by ICASSP202

    Single-Codec: Single-Codebook Speech Codec towards High-Performance Speech Generation

    Full text link
    The multi-codebook speech codec enables the application of large language models (LLM) in TTS but bottlenecks efficiency and robustness due to multi-sequence prediction. To avoid this obstacle, we propose Single-Codec, a single-codebook single-sequence codec, which employs a disentangled VQ-VAE to decouple speech into a time-invariant embedding and a phonetically-rich discrete sequence. Furthermore, the encoder is enhanced with 1) contextual modeling with a BLSTM module to exploit the temporal information, 2) a hybrid sampling module to alleviate distortion from upsampling and downsampling, and 3) a resampling module to encourage discrete units to carry more phonetic information. Compared with multi-codebook codecs, e.g., EnCodec and TiCodec, Single-Codec demonstrates higher reconstruction quality with a lower bandwidth of only 304bps. The effectiveness of Single-Code is further validated by LLM-TTS experiments, showing improved naturalness and intelligibility.Comment: Accepted by Interspeech 202

    Поліморфізм качок породи shaoxing за мікросателітними локусами

    Get PDF
    Microsatellite markers are now been widely used for the detection and description of micropopulation processes occurring in the populations of domestic animals for the effects of various factors of breeding pressure. Microsatellite loci distributed throughout eukaryotic genomes, making them the preferred genetic marker for high resolution genetic mapping. In recent years, rapid advances have been made in the development of molecular genetic maps. High-density linkage maps are now available for many farm animals, such as cattle, pigs, and goats. In contrast, mapping studies in avian species are much less advanced except in the chicken. According to FAO about 70% of ducks are bred in China. This country is a leader in growing ducks. The Shaoxing breed is one of the three major duck breeds in China. Ducks of this breed are characterized by high performance. According to the Bureau of Product Quality, the age of maturity (the beginning of egg laying) in these birds occurs at 130–140 days. The characteristics of the Shaoxing breed include the fact that the peak period of laying eggs lasts from eight to ten months. On average, one duck in 500 days gives from 290 to 310 eggs, which is one of the highest rates for egg breeds. That is why the purpose of our study was the microsatellite analysis of two populations of Shaoxing breed with 9 locuses was conducted. The selection of birds for the study were carried out on a duck farms in Zhejiang Generation Biological Science and Technology Co., Ltd. and Zhuji Guowei Poultry Development Co, Ltd., and at the laboratory of the Jjejiang Academy of Sciences Institute. Samples collection and  DNA  preparation: Venous blood  samples  were  collected  from 480 ducks (240 ducks of population I and 240 ducks of population II of the Shaoxing breeds)  of  both populations  into  3  ml  tubes  containing  EDTA  as anticoagulant agent. In total of 9 investigated loci in the Shaoxing breed population, only one locus was monomorphic (SMO10). The number of different alleles (Na) for each polymorphic locus ranged from 2 (SMO12) to 13 (APL79, CMO11) in population I and from 2 (APL78, SMO12) to 7 (APL79) in population II. On average, one locus had 5.889 alleles in population I and 3.889 of alleles in the population II. The effective number of alleles (Nе) was 1.735 in population I and 1.599 in population II. The number of alleles and the expected heterozygosity (Hexp) values can provide important information for the discrimination of individuals and breeds. The index of expected heterozygosity in population I was 0.336 and 0.307 in population II. The information index (I) was 0,702 in population I and 0,576 in population II. For each population was found private alleles, in population I 6 alleles and in population II just 4 alleles. The results show high level of polymorphism of the studied populations of ducks. The obtained results can be used in the creation of new lines of ducks.У статті наведені результати досліджень генетичної структури двох популяцій качок породи шаосінь за використання дев’яти мікросателітних локусів. Птицю досліджували на качиних фермах компаній Zhejiang Generation Biological Science and Technology Co., Ltd. та Zhuji Guowei Poultry Development Co, Ltd. за підтримки лабораторії Poultry Genetics Laboratory of the Zhejiang Academy of Sciences (Zhejiang Province, PRC). Було встановлено, що середнє число ефективних алелів (Ne) на локус у популяції І складало 1,735, а для популяції ІІ – 1,599. Показники інформаційного індексу становили 0,702 (популяція І) та 0,576 (популяція ІІ). Фактична гетерозиготність у популяції І була 0,298, а у популяції ІІ – 0,269. У результаті нашого дослідження для кожної популяції були виявлені приватні алелі. З 9 досліджених локусів, у популяції І було виявлено 6 приватних алелів, в той час, коли популяція ІІ мала лише 4 локуси. Загалом у популяції І виявлено 23 приватних алелів, а у популяції ІІ – 5. Найбільша кількість приватних алелів була в локусі CMO11 (9), а найменша – 1 алель у локусі SMO7 та SMO10 в популяції І. Популяція ІІ була бідніша на приватні алелі, так у локусі APL79 було 2 та по 1 у CMO11, SMO7, SMO10. Отримані результати свідчать про високий рівень внутрішньопородного поліморфізму шаосінь, що дозволяє розробку стратегій збереження та використання генетичних ресурсів качки за використання аналізу поліморфних локусів мікросателіті

    ANALYSIS OF MORPHOMETRIC PARAMETERS DUCK EGGS OF LOCAL BREED SHAOXING

    Full text link
    The efficiency of industrial poultry farming within the optimization of poultry technology, depends on the level of genetic potential of the flock. Selection features of Shaoxing ducks make this kind optimal for its breeding in the People's Republic of China. The study aims to evaluate the morphometric characteristics of Shaoxing duck eggs, which are bred on the breeding farm of Zhejiang Generation Biological Science and Technology Co., Ltd in Zhuji, Zhejiang Province, China. The weight, length, width of the eggs and the index of the egg shape have been determined. An individual method of counting the number of eggs laid by ducks of the Shaoxing breed for 4 adjacent months has been implemented. The average weight of the egg is 67.45 ± 0.22 g with limit values lim max = 89 g lim min = 45 g. The average value of egg length is 6.02 ± 0.01 cm, width – 4.45 ± 0.01 cm. The duck egg shape index is 74.01 ± 0.12. Thereby systematic individual studies of morphometric parameters of eggs will increase the effect of selection by expanding the indicators of lifelong assessment of the uterine population of ducks. Selection of queens for the breeding core of the breed according to the indicators of manufacturability of morphometric parameters of eggs will increase the incubation yield of ducklings and, accordingly, will be one of the effective mechanisms to ensure economic profitability of breeding Shaoxing ducks.</jats:p

    Controllable Emotion Transfer For End-to-End Speech Synthesis

    No full text

    Differences of methanogenesis between mesophilic and thermophilic in situ biogas-upgrading systems by hydrogen addition

    Full text link
    Abstract To investigate the differences in microbial community structure between mesophilic and thermophilic in situ biogas-upgrading systems by H2 addition, two reactors (35 °C and 55 °C) were run for four stages according to different H2 addition rates (H2/CO2 of 0:1, 1:1, and 4:1) and mixing mode (intermittent and continuous). 16S rRNA gene-sequencing technology was applied to analyze microbial community structure. The results showed that the temperature is a crucial factor in impacting succession of microbial community structure and the H2 utilization pathway. For mesophilic digestion, most of added H2 was consumed indirectly by the combination of homoacetogens and strict aceticlastic methanogens. In the thermophilic system, most of added H2 may be used for microbial cell growth, and part of H2 was utilized directly by strict hydrogenotrophic methanogens and facultative aceticlastic methanogens. Continuous stirring was harmful to the stabilization of mesophilic system, but not to the thermophilic one.</jats:p
    corecore