Coqui - Advanced Text-to-Speech Library

  •        853

Coqui TTS is a library for advanced Text-to-Speech generation. It's built on the latest research, was designed to achieve the best trade-off among ease-of-training, speed and quality. Coqui TTS comes with pretrained models, tools for measuring dataset quality and already used in 20+ languages for products and research projects.

Features:

  • High-performance Deep Learning models for Text2Speech tasks.
    • Text2Spec models (Tacotron, Tacotron2, Glow-TTS, SpeedySpeech).
    • Speaker Encoder to compute speaker embeddings efficiently.
    • Vocoder models (MelGAN, Multiband-MelGAN, GAN-TTS, ParallelWaveGAN, WaveGrad, WaveRNN)
  • Fast and efficient model training.
  • Detailed training logs on the terminal and Tensorboard.
  • Support for Multi-speaker TTS.
  • Efficient, flexible, lightweight but feature complete Trainer API.
  • Ability to convert PyTorch models to Tensorflow 2.0 and TFLite for inference.
  • Released and read-to-use models.
  • Tools to curate Text2Speech datasets underdataset_analysis.
  • Utilities to use and test your models.
  • Modular (but not too much) code base enabling easy implementation of new ideas.

https://coqui.ai/
https://github.com/coqui-ai/TTS/

Tags
Implementation
License
Platform

   




Related Projects

TensorFlowTTS - :stuck_out_tongue_closed_eyes: TensorFlowTTS: Real-Time State-of-the-art Speech Synthesis for Tensorflow 2 (supported including English, Korean, Chinese, German and Easy to adapt for other languages)

  •    Python

🤪 TensorFlowTTS provides real-time state-of-the-art speech synthesis architectures such as Tacotron-2, Melgan, Multiband-Melgan, FastSpeech, FastSpeech2 based-on TensorFlow 2. With Tensorflow 2, we can speed-up training/inference progress, optimizer further by using fake-quantize aware and pruning, make TTS models can be run faster than real-time and be able to deploy on mobile devices or embedded systems. Different Tensorflow version should be working but not tested yet. This repo will try to work with the latest stable TensorFlow version. We recommend you install TensorFlow 2.3.0 to training in case you want to use MultiGPU.

Parakeet - PAddle PARAllel text-to-speech toolKIT

  •    Python

Parakeet aims to provide a flexible, efficient and state-of-the-art text-to-speech toolkit for the open-source community. It is built on PaddlePaddle Fluid dynamic graph and includes many influential TTS models proposed by Baidu Research and other research groups. In particular, it features the latest WaveFlow model proposed by Baidu Research.

dc_tts - A TensorFlow Implementation of DC-TTS: yet another text-to-speech model

  •    Python

I implement yet another text-to-speech model, dc-tts, introduced in Efficiently Trainable Text-to-Speech System Based on Deep Convolutional Networks with Guided Attention. My goal, however, is not just replicating the paper. Rather, I'd like to gain insights about various sound projects. I train English models and an Korean model on four different speech datasets.

tacotron - A TensorFlow Implementation of Tacotron: A Fully End-to-End Text-To-Speech Synthesis Model

  •    Python

We train the model on three different speech datasets. LJ Speech Dataset is recently widely used as a benchmark dataset in the TTS task because it is publicly available. It has 24 hours of reasonable quality samples. Nick's audiobooks are additionally used to see if the model can learn even with less data, variable speech samples. They are 18 hours long. The World English Bible is a public domain update of the American Standard Version of 1901 into modern English. Its original audios are freely available here. Kyubyong split each chapter by verse manually and aligned the segmented audio clips to the text. They are 72 hours in total. You can download them at Kaggle Datasets.

mimic-recording-studio - Mimic Recording Studio is a Docker-based application you can install to record voice samples, which can then be trained into a TTS voice with Mimic2

  •    Javascript

The Mycroft open source Mimic technologies are Text-to-Speech engines which take a piece of written text and convert it into spoken audio. The latest generation of this technology, Mimic 2, uses machine learning techniques to create a model which can speak a specific language, sounding like the voice on which it was trained. The Mimic Recording Studio simplifies the collection of training data from individuals, each of which can be used to produce a distinct voice for Mimic.


Speect - Multilingual text-to-speech (TTS) system

  •    C

Speect is a multilingual text-to-speech (TTS) system. It offers a full TTS system (text analysis which decodes the text, and speech synthesis, which encodes the speech) with various API’s, as well as an environment for research and development of TTS systems and voices.

IBM TTS SDK

  •    C

This is a development package for IBM Text To Speech (TTS). It is intended to be used to build applications when a licensed ibmtts is not available. Only the ECI ABIs are provided. There is no TTS runtime code provided.

deepvoice3_pytorch - PyTorch implementation of convolutional neural networks-based text-to-speech synthesis models

  •    Python

Audio samples are available at https://r9y9.github.io/deepvoice3_pytorch/. NOTE: pretrained models are not compatible to master. To be updated soon.

Epos TTS System

  •    C++

Epos is a language independent rule-driven Text-to-Speech (TTS) system primarily designed to serve as a research tool. Epos is (or tries to be) independent of the language processed, linguistic description method, and computing environment.

loop - A method to generate speech across multiple speakers

  •    Python

PyTorch implementation of the method described in the paper VoiceLoop: Voice Fitting and Synthesis via a Phonological Loop. VoiceLoop is a neural text-to-speech (TTS) that is able to transform text to speech in voices that are sampled in the wild. Some demo samples can be found here.

loop - A method to generate speech across multiple speakers

  •    Python

PyTorch implementation of the method described in the paper VoiceLoop: Voice Fitting and Synthesis via a Phonological Loop. VoiceLoop is a neural text-to-speech (TTS) that is able to transform text to speech in voices that are sampled in the wild. Some demo samples can be found here.

pyannote-audio - Neural building blocks for speaker diarization: speech activity detection, speaker change detection, speaker embedding

  •    Python

Open Phd/postdoc positions at LIMSI combining machine learning, NLP, speech processing, and computer vision. If you use pyannote.audio in your research, please use the following citations.

Talkie - Text-to-speech browser extension button

  •    Javascript

Talkie is a Text-to-speech browser extension button. It lets you listen to the selected text on any part of a page — short snippets or entire news articles. Just highlight what you want to hear read aloud and hit play. Automatically detects the text language per-page, and chooses a voice in the same language to match it. Support is available for Chrome and Firefox.

lingvo - Lingvo

  •    Python

Lingvo is a framework for building neural networks in Tensorflow, particularly sequence models. A list of publications using Lingvo can be found here.

SAM - Software Automatic Mouth - Tiny Speech Synthesizer

  •    C

Sam is a very small Text-To-Speech (TTS) program written in C, that runs on most popular platforms. It is an adaption to C of the speech software SAM (Software Automatic Mouth) for the Commodore C64 published in the year 1982 by Don't Ask Software (now SoftVoice, Inc.). It includes a Text-To-Phoneme converter called reciter and a Phoneme-To-Speech routine for the final output. It is so small that it will work also on embedded computers. On my computer it takes less than 39KB (much smaller on embedded devices as the executable-overhead is not necessary) of disk space and is a fully stand alone program. For immediate output it uses the SDL-library, otherwise it can save .wav files. Simply type "make" in your command prompt. In order to compile without SDL remove the SDL statements from the CFLAGS and LFLAGS variables in the file "Makefile".

WP7 Text-to-Speech Tool & Translation Library

  •    

Windows Phone Text-to-Speech (wpTTS) produces speech from text strings. wpTTS also provides real-time translation between a select list of languages. (AppID required.)

Speech Server .NET

  •    CSharp

Speech Server .NET aims to add functionalities of Text-To-Speech (TTS) and Automatic Speech Recnognition (ASR) to handheld devices like Pocket PC and Smartphone, running Windows Mobile, that are wirelessly connected to a server. This server is able to generate a speech stream ...

aeneas - aeneas is a Python/C library and a set of tools to automagically synchronize audio and text (aka forced alignment)

  •    Python

aeneas is a Python/C library and a set of tools to automagically synchronize audio and text (aka forced alignment). aeneas automatically generates a synchronization map between a list of text fragments and an audio file containing the narration of the text. In computer science this task is known as (automatically computing a) forced alignment.

Voxx Speech Recognition Project

  •    VB

Written in VB 6 for Win98 and up. Our goal is to provide speech recognition and text to speech unlike any software currently in the market. Some features include TTS, Dictation using Microsoft SAPI 5.1 engines. Visit our Home Page for more info.






We have large collection of open source products. Follow the tags from Tag Cloud >>


Open source products are scattered around the web. Please provide information about the open source projects you own / you use. Add Projects.