node-ivona - IVONA Text-to-Speech SaaS client library for Node.js

  •        22

ivona is a client library for IVONA Text-to-Speech SaaS written in Node.js. The client automatically handles token authentication therefore the following API calls can be made right away.

https://github.com/pilwon/node-ivona

Dependencies:

lodash : ~1.3.1
soap : ~0.2.7

Tags
Implementation
License
Platform

   




Related Projects

Speect - Multilingual text-to-speech (TTS) system

  •    C

Speect is a multilingual text-to-speech (TTS) system. It offers a full TTS system (text analysis which decodes the text, and speech synthesis, which encodes the speech) with various API’s, as well as an environment for research and development of TTS systems and voices.

tacotron - A TensorFlow Implementation of Tacotron: A Fully End-to-End Text-To-Speech Synthesis Model

  •    Python

We train the model on three different speech datasets. LJ Speech Dataset is recently widely used as a benchmark dataset in the TTS task because it is publicly available. It has 24 hours of reasonable quality samples. Nick's audiobooks are additionally used to see if the model can learn even with less data, variable speech samples. They are 18 hours long. The World English Bible is a public domain update of the American Standard Version of 1901 into modern English. Its original audios are freely available here. Kyubyong split each chapter by verse manually and aligned the segmented audio clips to the text. They are 72 hours in total. You can download them at Kaggle Datasets.

dc_tts - A TensorFlow Implementation of DC-TTS: yet another text-to-speech model

  •    Python

I implement yet another text-to-speech model, dc-tts, introduced in Efficiently Trainable Text-to-Speech System Based on Deep Convolutional Networks with Guided Attention. My goal, however, is not just replicating the paper. Rather, I'd like to gain insights about various sound projects. I train English models and an Korean model on four different speech datasets.

aeneas - aeneas is a Python/C library and a set of tools to automagically synchronize audio and text (aka forced alignment)

  •    Python

aeneas is a Python/C library and a set of tools to automagically synchronize audio and text (aka forced alignment). aeneas automatically generates a synchronization map between a list of text fragments and an audio file containing the narration of the text. In computer science this task is known as (automatically computing a) forced alignment.

WP7 Text-to-Speech Tool & Translation Library

  •    

Windows Phone Text-to-Speech (wpTTS) produces speech from text strings. wpTTS also provides real-time translation between a select list of languages. (AppID required.)


Speech Server .NET

  •    CSharp

Speech Server .NET aims to add functionalities of Text-To-Speech (TTS) and Automatic Speech Recnognition (ASR) to handheld devices like Pocket PC and Smartphone, running Windows Mobile, that are wirelessly connected to a server. This server is able to generate a speech stream ...

Talkie - Text-to-speech browser extension button

  •    Javascript

Talkie is a Text-to-speech browser extension button. It lets you listen to the selected text on any part of a page — short snippets or entire news articles. Just highlight what you want to hear read aloud and hit play. Automatically detects the text language per-page, and chooses a voice in the same language to match it. Support is available for Chrome and Firefox.

nodejs-speech - Node

  •    Javascript

The Cloud Speech API enables easy integration of Google speech recognition technologies into developer applications. Send audio and receive a text transcription from the Cloud Speech API service. Select or create a Cloud Platform project.

sonus - :speech_balloon: /so.nus/ STT (speech to text) for Node with offline hotword detection

  •    Javascript

Sonus lets you quickly and easily add a VUI (Voice User Interface) to any hardware or software project. Just like Alexa, Google Now, and Siri, Sonus is always listening offline for a customizable hotword. Once that hotword is detected your speech is streamed to the cloud recognition service of your choice - then you get the results. Generally, running npm install should suffice. This module however, requires you to install SoX.

eSpeak - Text to Speech

  •    C

eSpeak is a compact open source software speech synthesizer for English and other languages. eSpeak uses a formant synthesis method. This allows many languages to be provided in a small size. It supports SAPI5 version for Windows, so it can be used with screen-readers and other programs that support the Windows SAPI5 interface. It can translate text into phoneme codes, so it could be adapted as a front end for another speech synthesis engine.

react-native-speech - A text-to-speech library for React Native.

  •    Objective-C

React Native Speech is a text-to-speech library for React Native. In order to use Speech, you must first link the library your project. There's excellent documentation on how to do this in the React Native Docs.

Voxx Speech Recognition Project

  •    VB

Written in VB 6 for Win98 and up. Our goal is to provide speech recognition and text to speech unlike any software currently in the market. Some features include TTS, Dictation using Microsoft SAPI 5.1 engines. Visit our Home Page for more info.

HTK - Speech Recognition Toolkit

  •    C

The Hidden Markov Model Toolkit (HTK) is a portable toolkit for building and manipulating hidden Markov models. HTK is primarily used for speech recognition research although it has been used for numerous other applications including research into speech synthesis, character recognition and DNA sequencing. HTK is in use at hundreds of sites worldwide.

loop - A method to generate speech across multiple speakers

  •    Python

PyTorch implementation of the method described in the paper VoiceLoop: Voice Fitting and Synthesis via a Phonological Loop. VoiceLoop is a neural text-to-speech (TTS) that is able to transform text to speech in voices that are sampled in the wild. Some demo samples can be found here.

loop - A method to generate speech across multiple speakers

  •    Python

PyTorch implementation of the method described in the paper VoiceLoop: Voice Fitting and Synthesis via a Phonological Loop. VoiceLoop is a neural text-to-speech (TTS) that is able to transform text to speech in voices that are sampled in the wild. Some demo samples can be found here.

p5.speech - Web Audio Speech Synthesis / Recognition for p5.js

  •    Javascript

p5.speech is a JavaScript library that provides simple, clear access to the Web Speech and Speech Recognition APIs, allowing for the easy creation of sketches that can talk and listen. It consists of two object classes (p5.Speech and p5.SpeechRec) along with accessor functions to speak and listen for text, change parameters (synthesis voices, recognition models, etc.), and retrieve callbacks from the system. Speech recognition requires launching from a server (e.g. a python simpleserver on a local machine).

stt-benchmark - speech to text benchmark framework

  •    Python

This is a minimalist and extensible framework for benchmarking different speech-to-text engines. It has been developed and tested on Ubuntu 18.04 with Python3.6. This framework has been developed by Picovoice as part of the project Cheetah. Cheetah is Picovoice's speech-to-text engine specifically designed for IoT applications. Deep learning has been the main driver in recent improvements in speech recognition. But due to stringent compute/storage limitations of IoT platforms it is most beneficial to the cloud-based engines. Picovoice's proprietary deep learning technology enables transferring these improvements to IoT platforms with much lower CPU/memory footprint. The goal is to be able to run Cheetah on any platform with a C Compiler and a few MB of memory.

chat.susi.ai - SUSI AI web client at https://chat.susi.ai/

  •    Javascript

Susi is an artificial intelligence system, combining pattern matching, internet data, data flow-, and inference engine principles. Through some abilities to reflect, it can remember the user input to produce deductions and personalized feedback. Its purpose is to explore the abilities of an artificial companion and to answer the remaining unanswered questions. The SUSI.AI web chat is a front-end developed for web access of SUSI. If you get speech output, then the Web API Speech Synthesis is supported by your browser and the text-to-speech features of SUSI Web Chat will work. The Web Speech API has support for all latest Chrome/-ium browsers as mentioned in the Web Speech API Mozilla docs. However there are a few bugs with some Chromium versions please check this link on how to fix them locally.