MARY is an open-source, multilingual Text-to-Speech Synthesis platform written in Java. It supports German, British and American English, Telugu, Turkish, and Russian. 
 Demo: <A HREF="http://mary.dfki.de:59125/" target="_blank">http://mary.dfki.de:59125/</A>

MARY is an open-source, multilingual Text-to-Speech Synthesis platform written in Java. It supports German, British and American English, Telugu, Turkish, and Russian.

MARY - Text-to-Speech System

The Hidden Markov Model Toolkit (HTK) is a portable toolkit for building and manipulating hidden Markov models. HTK is primarily used for speech recognition research although it has been used for numerous other applications including research into speech synthesis, character recognition and DNA sequencing. HTK is in use at hundreds of sites worldwide.
 
HTK consists of a set of library modules and tools available in C source form. The tools provide sophisticated facilities for speech analysis, HMM training, testing and results analysis. The software supports HMMs using both continuous density mixture Gaussians and discrete distributions and can be used to build complex HMM systems. The HTK release contains extensive documentation and examples.

The Hidden Markov Model Toolkit (HTK) is a portable toolkit for building and manipulating hidden Markov models. HTK is primarily used for speech recognition research although it has been used for numerous other applications including research into speech synthesis, character recognition and DNA sequencing. HTK is in use at hundreds of sites worldwide.

HTK - Speech Recognition Toolkit

Coqui TTS is a library for advanced Text-to-Speech generation. It's built on the latest research, was designed to achieve the best trade-off among ease-of-training, speed and quality. Coqui TTS comes with pretrained models, tools for measuring dataset quality and already used in 20+ languages for products and research projects.
 
Features: 
<ul style="margin-bottom: 16px; padding-left: 2em;"><li style="">High-performance Deep Learning models for Text2Speech tasks.<ul style="padding-left: 2em;"><li>Text2Spec models (Tacotron, Tacotron2, Glow-TTS, SpeedySpeech).</li><li style="margin-top: 0.25em;">Speaker Encoder to compute speaker embeddings efficiently.</li><li style="margin-top: 0.25em;">Vocoder models (MelGAN, Multiband-MelGAN, GAN-TTS, ParallelWaveGAN, WaveGrad, WaveRNN)</li></ul></li><li style="margin-top: 0.25em;">Fast and efficient model training.</li><li style="margin-top: 0.25em;">Detailed training logs on the terminal and Tensorboard.</li><li style="margin-top: 0.25em;">Support for Multi-speaker TTS.</li><li style="margin-top: 0.25em;">Efficient, flexible, lightweight but feature complete&nbsp;Trainer API.</li><li style="margin-top: 0.25em;">Ability to convert PyTorch models to Tensorflow 2.0 and TFLite for inference.</li><li style="margin-top: 0.25em;">Released and read-to-use models.</li><li style="margin-top: 0.25em;">Tools to curate Text2Speech datasets underdataset_analysis.</li><li style="margin-top: 0.25em;">Utilities to use and test your models.</li><li style="margin-top: 0.25em;">Modular (but not too much) code base enabling easy implementation of new ideas.</li></ul>

Coqui TTS is a library for advanced Text-to-Speech generation. It's built on the latest research, was designed to achieve the best trade-off among ease-of-training, speed and quality.  Coqui TTS comes with pretrained models, tools for measuring dataset quality and already used in 20+ languages for products and research projects.



Coqui - Advanced Text-to-Speech Library

SpeakRight is an Java framework for writing speech recognition applications in VoiceXML. Dynamic generation of VoiceXML is done using the popular StringTemplate templating framework. Although VoiceXML uses a similar web architecture as HTML, the needs of a speech app are very different. SpeakRight lives in application code layer, typically in a servlet. The SpeakRight runtime dynamically generates VoiceXML pages, one per HTTP request. 
 Applications are written in Java using SpeakRight's extensible classes.

SpeakRight is an Java framework for writing speech recognition applications in VoiceXML. Dynamic generation of VoiceXML is done using the popular StringTemplate templating framework. Although VoiceXML uses a similar web architecture as HTML, the needs of a speech app are very different. SpeakRight lives in application code layer, typically in a servlet. The SpeakRight runtime dynamically generates VoiceXML pages, one per HTTP request.

SpeakRight Framework - Helps to build Speech Recognition Applications

FreeTTS is a speech synthesis system written entirely in the Java. It is based upon Flite, a small run-time speech synthesis engine developed at Carnegie Mellon University. Flite is derived from the Festival Speech Synthesis System from the University of Edinburgh and the FestVox project from Carnegie Mellon University. FreeTTS supports  a subset of the JSAPI 1.0 java speech synthesis specification.

FreeTTS - Speech Synthesizer in Java

The Festvox project aims to make the building of new synthetic voices more systemic and better documented, making it possible for anyone to build a new voice. Festvox is the base for most of the Speech Synthesis libraries.

Festvox - Builds New Synthetic Voices

Simon is an open source speech recognition program that can replace your mouse and keyboard. The system is designed to be as flexible as possible and will work with any language or dialect. It is a real dictation system.

Simon -  Speech Recognition and Dictation System

CMUSphinx toolkit is a speech recognition toolkit with various tools used to build speech applications. CMU Sphinx toolkit has a number of packages for different tasks. Pocketsphinx — lightweight recognizer library written in C, Sphinxbase — support library required by Pocketsphinx, Sphinx4 — adjustable, modifiable recognizer written in Java, CMUclmtk — language model tools, Sphinxtrain — acoustic model training tools, Sphinx3 — decoder for speech recognition research written in C.


CMU Sphinx - Toolkit For Speech Recognition

Festival offers a general framework for building speech synthesis systems as well as including examples of various modules. It offers full text to speech through a APIs via shell and though a Scheme command interpreter. It has native support for Apple OS. It supports English and Spanish languages.

Festival - Speech Synthesis System

eSpeak is a compact open source software speech synthesizer for English and other languages. eSpeak uses a formant synthesis method. This allows many languages to be provided in a small size. It supports SAPI5 version for Windows, so it can be used with screen-readers and other programs that support the Windows SAPI5 interface. It can translate text into phoneme codes, so it could be adapted as a front end for another speech synthesis engine.

eSpeak - Text to Speech

Flite (festival-lite) is a small, fast run-time synthesis engine developed at CMU and primarily designed for small embedded machines and/or large servers. Flite is designed as an alternative synthesis engine to Festival for voices built using the FestVox suite of voice building tools. 

Flite - Fast Run time Synthesis Engine

"Julius" is a high-performance, two-pass large vocabulary continuous speech recognition (LVCSR) decoder software for speech-related researchers and developers. Based on word N-gram and context-dependent HMM, it can perform almost real-time decoding on most current PCs in 60k word dictation task. 
 
Major search techniques are fully incorporated such as tree lexicon, N-gram factoring, cross-word context dependency handling, enveloped beam search, Gaussian pruning, Gaussian selection, etc. It supports Windows SAPI.

"Julius" is a high-performance, two-pass large vocabulary continuous speech recognition (LVCSR) decoder software for speech-related researchers and developers. Based on word N-gram and context-dependent HMM, it can perform almost real-time decoding on most current PCs in 60k word dictation task. 

Julius - Large Vocabulary CSR Engine

Speect is a multilingual text-to-speech (TTS) system. It offers a full TTS system (text analysis which decodes the text, and speech synthesis, which encodes the speech) with various API’s, as well as an environment for research and development of TTS systems and voices.

Speect - Multilingual text-to-speech (TTS) system

Discover open source projects across all platforms

Projects