Displaying 1 to 20 from 31 results

Leon - Your open-source personal assistant

  •    Python

Leon is an open-source personal assistant who can live on your server. He does stuff when you ask him for. You can talk to him and he can talk to you. You can also text him and he can also text you. If you want to, Leon can communicate with you by being offline to protect your privacy.

espnet - End-to-End Speech Processing Toolkit

  •    Shell

ESPnet is an end-to-end speech processing toolkit, mainly focuses on end-to-end speech recognition, and end-to-end text-to-speech. ESPnet uses chainer and pytorch as a main deep learning engine, and also follows Kaldi style data processing, feature extraction/format, and recipes to provide a complete setup for speech recognition and other speech processing experiments. To use cuda (and cudnn), make sure to set paths in your .bashrc or .bash_profile appropriately.

deepvoice3_pytorch - PyTorch implementation of convolutional neural networks-based text-to-speech synthesis models

  •    Python

Audio samples are available at https://r9y9.github.io/deepvoice3_pytorch/. NOTE: pretrained models are not compatible to master. To be updated soon.

kalliope - Kalliope is a framework that will help you to create your own personal assistant.

  •    Python

Kalliope is a framework that will help you to create your own personal assistant. The concept is to create the brain of your assistant by attaching an input signal (vocal order, scheduled event, MQTT message, GPIO event, etc..) to one or multiple actions called neurons.




merlin - This is now the official location of the Merlin project.

  •    Python

This repository contains the Neural Network (NN) based Speech Synthesis System developed at the Centre for Speech Technology Research (CSTR), University of Edinburgh.Merlin is a toolkit for building Deep Neural Network models for statistical parametric speech synthesis. It must be used in combination with a front-end text processor (e.g., Festival) and a vocoder (e.g., STRAIGHT or WORLD).

termit - Translations with speech synthesis in your terminal as a ruby gem

  •    Ruby

Termit is an easy way to translate stuff in your terminal. You can check out its node.js npm version normit.Idea by Nedomas. See and hear your messages translated to target lang every time you commit. You can do this two ways: overriding the git command, and using a post-commit hook in git.

wavenet_vocoder - WaveNet vocoder

  •    Python

The goal of the repository is to provide an implementation of the WaveNet vocoder, which can generate high quality raw speech samples conditioned on linguistic or acoustic features. Audio samples are available at https://r9y9.github.io/wavenet_vocoder/.

Talkie - Text-to-speech browser extension button

  •    Javascript

Talkie is a Text-to-speech browser extension button. It lets you listen to the selected text on any part of a page — short snippets or entire news articles. Just highlight what you want to hear read aloud and hit play. Automatically detects the text language per-page, and chooses a voice in the same language to match it. Support is available for Chrome and Firefox.


p5.speech - Web Audio Speech Synthesis / Recognition for p5.js

  •    Javascript

p5.speech is a JavaScript library that provides simple, clear access to the Web Speech and Speech Recognition APIs, allowing for the easy creation of sketches that can talk and listen. It consists of two object classes (p5.Speech and p5.SpeechRec) along with accessor functions to speak and listen for text, change parameters (synthesis voices, recognition models, etc.), and retrieve callbacks from the system. Speech recognition requires launching from a server (e.g. a python simpleserver on a local machine).

espeak - eSpeak NG is an open source speech synthesizer that supports 99 languages and accents.

  •    C

The eSpeak NG (Next Generation) Text-to-Speech program is an open source speech synthesizer that supports 100 languages and accents. It is based on the eSpeak engine created by Jonathan Duddington. It uses spectral formant synthesis by default which sounds robotic, but can be configured to use Klatt formant synthesis or MBROLA to give it a more natural sound. See the CHANGELOG for a description of the changes in the various releases and with the eSpeak project.

Sagen - Free, open-source TTS engine for .NET

  •    CSharp

Sagen (German for "to say") is my attempt at making a text-to-speech engine aimed at .NET developers who don't have thousands of dollars at their disposal to license a commercial speech synthesis solution. In many ways, it is an experiment and continual learning experience for me, as I am not an expert in speech science, phonetics, or vocal acoustics; I simply want to see how far I can go with original research, freely available resources, and lots of patience.There are tons of TTS engines out there already, but I feel like they're all missing something.

normit - Google Translate with speech synthesis in your terminal as a node package

  •    Javascript

Normit is an easy way to translate stuff in your terminal. You can check out its Ruby gem version termit.I am no shell ninja so if you know how to make it work in bash then please submit a PR.

spoken-word - Spoken Word

  •    Javascript

Add text-to-speech (TTS) to content, with playback controls, read-along highlighting, multi-lingual support, and settings for rate, pitch, and voice. Add text-to-speech (TTS) to content, with playback controls, read-along highlighting, multi-lingual support, and settings for rate, pitch, and voice.

flite - Fork of CMU Flite 2

  •    C

Fork of CMU Flite 2.0.0 (http://www.festvox.org/flite/) with fix of memory leaks in the audio driver on Windows. Flite 2.1.0 was released and the official repository was created at https://github.com/festvox/flite. I'll make pull requests when I have time.

TinyCog - Small Robot, Toy Robot platform

  •    C++

A collection of speech, vision, and movement functionalities aimed at small or toy robots on embedded systems, such as the Raspberry Pi computer. High level reasoning, language understanding, language gneration and movement planning is provided by OpenCog. The current hardware platform requires an RPI3 computer, a Pi Camera V2 and a USB Microphone; other sensor/detector components are planned.

expressive_tacotron - Tensorflow Implementation of Expressive Tacotron

  •    Python

This project aims at implementing the paper, Towards End-to-End Prosody Transfer for Expressive Speech Synthesis with Tacotron, to verify its concept. Most of the baseline codes are based on my previous Tacotron implementation. LJ Speech Dataset is recently widely used as a benchmark dataset in the TTS task because it is publicly available. It has 24 hours of reasonable quality samples.

cross_vc - Cross-lingual Voice Conversion

  •    Python

I wish I could speak many languages. Wait. Actually I do. But only 4 or 5 languages with limited proficiency. Instead, can I create a voice model that can copy any voice in any language? Possibly! A while ago, me and my colleage Dabi opened a simple voice conversion project. Based on it, I expanded the idea to cross-languages. I found it's very challenging with my limited knowledge. Unfortunately, the results I have for now are not good, but hopefully it will be helpful for some people.

Cognitive-Speech-TTS - Microsoft Text-to-Speech API sample code in several languages, part of Cognitive Services

  •    CSharp

Microsoft also offers Neural TTS preview which can be invoked following the samples in this repo as well. What you need is to use a neural TTS endpoint. Recommend to run the CSharp example first which is always kept up to date.

PytorchWaveNetVocoder - WaveNet-Vocoder implementation with pytorch

  •    Shell

This repository is the wavenet-vocoder implementation with pytorch. Recommend to use the GPU with 10GB> memory.