aeneas - aeneas is a Python/C library and a set of tools to automagically synchronize audio and text (aka forced alignment)

  •        427

aeneas is a Python/C library and a set of tools to automagically synchronize audio and text (aka forced alignment). aeneas automatically generates a synchronization map between a list of text fragments and an audio file containing the narration of the text. In computer science this task is known as (automatically computing a) forced alignment.

http://www.readbeyond.it/aeneas/
https://github.com/readbeyond/aeneas

Tags
Implementation
License
Platform

   




Related Projects

espeak - eSpeak NG is an open source speech synthesizer that supports 99 languages and accents.

  •    C

The eSpeak NG (Next Generation) Text-to-Speech program is an open source speech synthesizer that supports 100 languages and accents. It is based on the eSpeak engine created by Jonathan Duddington. It uses spectral formant synthesis by default which sounds robotic, but can be configured to use Klatt formant synthesis or MBROLA to give it a more natural sound. See the CHANGELOG for a description of the changes in the various releases and with the eSpeak project.

eSpeak - Text to Speech

  •    C

eSpeak is a compact open source software speech synthesizer for English and other languages. eSpeak uses a formant synthesis method. This allows many languages to be provided in a small size. It supports SAPI5 version for Windows, so it can be used with screen-readers and other programs that support the Windows SAPI5 interface. It can translate text into phoneme codes, so it could be adapted as a front end for another speech synthesis engine.

asterisk-espeak

  •    C

eSpeak text-to-speech module for Asterisk. This provides the quot;espeakquot; dialplan application, which allows you to use the eSpeak TTS Engine as a speech synthesizer in Asterisk.

speak.js - Text-to-Speech in JavaScript using eSpeak

  •    C++

A port of the eSpeak speech synthesizer from C++ to JavaScript using Emscripten. Enables text-to-speech on the web using only JavaScript and HTML5.

eSpeakIt

  •    Javascript

eSpeakIt is a Firefox extension that converts text to speech (using the espeak command), and plays the audio or saves it for use in portable media players. eSpeak must be installed for this to work. (see http://espeak.sourceforge.net/)


Speect - Multilingual text-to-speech (TTS) system

  •    C

Speect is a multilingual text-to-speech (TTS) system. It offers a full TTS system (text analysis which decodes the text, and speech synthesis, which encodes the speech) with various API’s, as well as an environment for research and development of TTS systems and voices.

tacotron - A TensorFlow Implementation of Tacotron: A Fully End-to-End Text-To-Speech Synthesis Model

  •    Python

We train the model on three different speech datasets. LJ Speech Dataset is recently widely used as a benchmark dataset in the TTS task because it is publicly available. It has 24 hours of reasonable quality samples. Nick's audiobooks are additionally used to see if the model can learn even with less data, variable speech samples. They are 18 hours long. The World English Bible is a public domain update of the American Standard Version of 1901 into modern English. Its original audios are freely available here. Kyubyong split each chapter by verse manually and aligned the segmented audio clips to the text. They are 72 hours in total. You can download them at Kaggle Datasets.

Audible Alerts

  •    C

Audible Alerts is a Pidgin (libpurple) plugin to produce notification of an IM audibly. It uses eSpeak, a text-to-speech program, to call out the name of the buddy who IMs you. You can download eSpeak from www.sourceforge.net/projects/espeak

speak-js - Text-to-Speech in JavaScript

  •    Javascript

A port of the eSpeak speech synthesizer from C++ to JavaScript using Emscripten. Enables text-to-speech on the web using only JavaScript and HTML5.

dc_tts - A TensorFlow Implementation of DC-TTS: yet another text-to-speech model

  •    Python

I implement yet another text-to-speech model, dc-tts, introduced in Efficiently Trainable Text-to-Speech System Based on Deep Convolutional Networks with Guided Attention. My goal, however, is not just replicating the paper. Rather, I'd like to gain insights about various sound projects. I train English models and an Korean model on four different speech datasets.

OpenNLP - Machine learning based toolkit for the processing of natural language text

  •    Java

The Apache OpenNLP library is a machine learning based toolkit for the processing of natural language text. It supports the most common NLP tasks, such as tokenization, sentence segmentation, part-of-speech tagging, named entity extraction, chunking, parsing, and coreference resolution. These tasks are usually required to build more advanced text processing services. OpenNLP also includes maximum entropy and perceptron based machine learning.

Indian Speech Synthesis System(festival)

  •    

festival-in will have different speech synthesis systems for respective Indian Languages based on quot;festivalquot; TTS (Text-To-Speech engine) under it's umbrella. It will have modules (tokenizer and lexical) for respective Indian Languages.

Festival - Speech Synthesis System

  •    C++

Festival offers a general framework for building speech synthesis systems as well as including examples of various modules. It offers full text to speech through a APIs via shell and though a Scheme command interpreter. It has native support for Apple OS. It supports English and Spanish languages.

Talkie - Text-to-speech browser extension button

  •    Javascript

Talkie is a Text-to-speech browser extension button. It lets you listen to the selected text on any part of a page — short snippets or entire news articles. Just highlight what you want to hear read aloud and hit play. Automatically detects the text language per-page, and chooses a voice in the same language to match it. Support is available for Chrome and Firefox.

speech2text - Using Google Speech to Text API Provide a Simple Interface to Convert Audio Files

  •    Ruby

Using the power of ffmpeg/flac/Google and ruby here is a simple interface to play with to convert speech to text. We're able to provide a very simple API in Ruby to decode simple audio to text.

eSpeak: speech synthesis

  •    C++

Text to Speech engine for English and many other languages. Compact size with clear but artificial pronunciation. Available as a command-line program with many options, a shared library for Linux, and a Windows SAPI5 version.

WP7 Text-to-Speech Tool & Translation Library

  •    

Windows Phone Text-to-Speech (wpTTS) produces speech from text strings. wpTTS also provides real-time translation between a select list of languages. (AppID required.)

p5.speech - Web Audio Speech Synthesis / Recognition for p5.js

  •    Javascript

p5.speech is a JavaScript library that provides simple, clear access to the Web Speech and Speech Recognition APIs, allowing for the easy creation of sketches that can talk and listen. It consists of two object classes (p5.Speech and p5.SpeechRec) along with accessor functions to speak and listen for text, change parameters (synthesis voices, recognition models, etc.), and retrieve callbacks from the system. Speech recognition requires launching from a server (e.g. a python simpleserver on a local machine).

mimic - Mycroft's TTS engine, based on CMU's Flite (Festival Lite)

  •    C

Mimic is a fast, lightweight Text-to-speech engine developed by Mycroft A.I. and VocaliD, based on Carnegie Mellon University’s Flite (Festival-Lite) software. Mimic takes in text and reads it out loud to create a high quality voice. This is the list of requirements. Below there is the commands needed on the most popular distributions and supported OS.