Coqui TTS is a library for advanced Text-to-Speech generation. It's built on the latest research, was designed to achieve the best trade-off among ease-of-training, speed and quality. Coqui TTS comes with pretrained models, tools for measuring dataset quality and already used in 20+ languages for products and research projects.
Features:
Tags | text-to-speech deep-learning speech pytorch tts vocoder tacotron speaker-encodings tensorflow2 melgan speaker-encoder melgan-stft multi-speaker-tts glow-tts hifigan align-tts tts-model |
Implementation | Python |
License | MPL |
Platform | Windows Linux |
🤪 TensorFlowTTS provides real-time state-of-the-art speech synthesis architectures such as Tacotron-2, Melgan, Multiband-Melgan, FastSpeech, FastSpeech2 based-on TensorFlow 2. With Tensorflow 2, we can speed-up training/inference progress, optimizer further by using fake-quantize aware and pruning, make TTS models can be run faster than real-time and be able to deploy on mobile devices or embedded systems. Different Tensorflow version should be working but not tested yet. This repo will try to work with the latest stable TensorFlow version. We recommend you install TensorFlow 2.3.0 to training in case you want to use MultiGPU.
text-to-speech real-time tts speech-synthesis vocoder tflite voice-cloning tensorflow2 fastspeech tacotron2 melgan multi-speaker-tts multiband-melgan fastspeech2 parallel-wavegan mobile-tts zh-tts chinese-tts korea-tts german-ttsParakeet aims to provide a flexible, efficient and state-of-the-art text-to-speech toolkit for the open-source community. It is built on PaddlePaddle Fluid dynamic graph and includes many influential TTS models proposed by Baidu Research and other research groups. In particular, it features the latest WaveFlow model proposed by Baidu Research.
text-to-speech speech-synthesis voice-cloning ge2e tacotron2 multi-speaker-tts fastspeech2 waveflow transformer-tts parallelwavegan speedyspeech text-frontendI implement yet another text-to-speech model, dc-tts, introduced in Efficiently Trainable Text-to-Speech System Based on Deep Convolutional Networks with Guided Attention. My goal, however, is not just replicating the paper. Rather, I'd like to gain insights about various sound projects. I train English models and an Korean model on four different speech datasets.
speech speech-to-text ttsWe train the model on three different speech datasets. LJ Speech Dataset is recently widely used as a benchmark dataset in the TTS task because it is publicly available. It has 24 hours of reasonable quality samples. Nick's audiobooks are additionally used to see if the model can learn even with less data, variable speech samples. They are 18 hours long. The World English Bible is a public domain update of the American Standard Version of 1901 into modern English. Its original audios are freely available here. Kyubyong split each chapter by verse manually and aligned the segmented audio clips to the text. They are 72 hours in total. You can download them at Kaggle Datasets.
tts tensorflow speech-synthesis-model speechThe Mycroft open source Mimic technologies are Text-to-Speech engines which take a piece of written text and convert it into spoken audio. The latest generation of this technology, Mimic 2, uses machine learning techniques to create a model which can speak a specific language, sounding like the voice on which it was trained. The Mimic Recording Studio simplifies the collection of training data from individuals, each of which can be used to produce a distinct voice for Mimic.
docker voice microphone tts mycroft hacktoberfest recording-studio tacotron mimic mycroftai tts-engineSpeect is a multilingual text-to-speech (TTS) system. It offers a full TTS system (text analysis which decodes the text, and speech synthesis, which encodes the speech) with various API’s, as well as an environment for research and development of TTS systems and voices.
text-to-speech text analysis speechThis is a development package for IBM Text To Speech (TTS). It is intended to be used to build applications when a licensed ibmtts is not available. Only the ECI ABIs are provided. There is no TTS runtime code provided.
Audio samples are available at https://r9y9.github.io/deepvoice3_pytorch/. NOTE: pretrained models are not compatible to master. To be updated soon.
tts speech-synthesis end-to-end speech-processing machine-learning english japanese pytorchEpos is a language independent rule-driven Text-to-Speech (TTS) system primarily designed to serve as a research tool. Epos is (or tries to be) independent of the language processed, linguistic description method, and computing environment.
PyTorch implementation of the method described in the paper VoiceLoop: Voice Fitting and Synthesis via a Phonological Loop. VoiceLoop is a neural text-to-speech (TTS) that is able to transform text to speech in voices that are sampled in the wild. Some demo samples can be found here.
PyTorch implementation of the method described in the paper VoiceLoop: Voice Fitting and Synthesis via a Phonological Loop. VoiceLoop is a neural text-to-speech (TTS) that is able to transform text to speech in voices that are sampled in the wild. Some demo samples can be found here.
Open Phd/postdoc positions at LIMSI combining machine learning, NLP, speech processing, and computer vision. If you use pyannote.audio in your research, please use the following citations.
pytorch speech-processing speaker-diarization lstm deep-learning speech-activity-detection speaker-change-detection speaker-embeddingTalkie is a Text-to-speech browser extension button. It lets you listen to the selected text on any part of a page — short snippets or entire news articles. Just highlight what you want to hear read aloud and hit play. Automatically detects the text language per-page, and chooses a voice in the same language to match it. Support is available for Chrome and Firefox.
text-to-speech web-speech-api tts speech-synthesis browser-extensionLingvo is a framework for building neural networks in Tensorflow, particularly sequence models. A list of publications using Lingvo can be found here.
nlp research translation tensorflow machine-translation speech distributed tts speech-synthesis mnist speech-recognition lm seq2seq speech-to-text gpu-computing language-model asrSam is a very small Text-To-Speech (TTS) program written in C, that runs on most popular platforms. It is an adaption to C of the speech software SAM (Software Automatic Mouth) for the Commodore C64 published in the year 1982 by Don't Ask Software (now SoftVoice, Inc.). It includes a Text-To-Phoneme converter called reciter and a Phoneme-To-Speech routine for the final output. It is so small that it will work also on embedded computers. On my computer it takes less than 39KB (much smaller on embedded devices as the executable-overhead is not necessary) of disk space and is a fully stand alone program. For immediate output it uses the SDL-library, otherwise it can save .wav files. Simply type "make" in your command prompt. In order to compile without SDL remove the SDL statements from the CFLAGS and LFLAGS variables in the file "Makefile".
speech-synthesis c64 reciter phonemesWindows Phone Text-to-Speech (wpTTS) produces speech from text strings. wpTTS also provides real-time translation between a select list of languages. (AppID required.)
mango metro text-to-speech tts windows-phone windows-phone-7Speech Server .NET aims to add functionalities of Text-To-Speech (TTS) and Automatic Speech Recnognition (ASR) to handheld devices like Pocket PC and Smartphone, running Windows Mobile, that are wirelessly connected to a server. This server is able to generate a speech stream ...
speech asr compact-framework library mobile nectarSee http://gtts.readthedocs.org/ for documentation and examples.
speech tts text-to-speech gttsaeneas is a Python/C library and a set of tools to automagically synchronize audio and text (aka forced alignment). aeneas automatically generates a synchronization map between a list of text fragments and an audio file containing the narration of the text. In computer science this task is known as (automatically computing a) forced alignment.
speech alignment tts nlp espeak espeak-ng festival cli dtw ffmpeg forced-alignment text audio srt smil text-to-speechWritten in VB 6 for Win98 and up. Our goal is to provide speech recognition and text to speech unlike any software currently in the market. Some features include TTS, Dictation using Microsoft SAPI 5.1 engines. Visit our Home Page for more info.
We have large collection of open source products. Follow the tags from
Tag Cloud >>
Open source products are scattered around the web. Please provide information
about the open source projects you own / you use.
Add Projects.