HTK - Speech Recognition Toolkit

  •        2348

The Hidden Markov Model Toolkit (HTK) is a portable toolkit for building and manipulating hidden Markov models. HTK is primarily used for speech recognition research although it has been used for numerous other applications including research into speech synthesis, character recognition and DNA sequencing. HTK is in use at hundreds of sites worldwide.

HTK consists of a set of library modules and tools available in C source form. The tools provide sophisticated facilities for speech analysis, HMM training, testing and results analysis. The software supports HMMs using both continuous density mixture Gaussians and discrete distributions and can be used to build complex HMM systems. The HTK release contains extensive documentation and examples.

http://htk.eng.cam.ac.uk/

Tags
Implementation
License
Platform

   




Related Projects

Kaldi - Speech Recognition Toolkit

  •    C++

Kaldi is a Speech recognition research toolkit. It is similar in aims and scope to HTK. The goal is to have modern and flexible code, written in C++, that is easy to modify and extend.

CMU Sphinx - Toolkit For Speech Recognition

  •    C

CMUSphinx toolkit is a speech recognition toolkit with various tools used to build speech applications. CMU Sphinx toolkit has a number of packages for different tasks. Pocketsphinx — lightweight recognizer library written in C, Sphinxbase — support library required by Pocketsphinx, Sphinx4 — adjustable, modifiable recognizer written in Java, CMUclmtk — language model tools, Sphinxtrain — acoustic model training tools, Sphinx3 — decoder for speech recognition research written in C.

p5.speech - Web Audio Speech Synthesis / Recognition for p5.js

  •    Javascript

p5.speech is a JavaScript library that provides simple, clear access to the Web Speech and Speech Recognition APIs, allowing for the easy creation of sketches that can talk and listen. It consists of two object classes (p5.Speech and p5.SpeechRec) along with accessor functions to speak and listen for text, change parameters (synthesis voices, recognition models, etc.), and retrieve callbacks from the system. Speech recognition requires launching from a server (e.g. a python simpleserver on a local machine).

espnet - End-to-End Speech Processing Toolkit

  •    Shell

ESPnet is an end-to-end speech processing toolkit, mainly focuses on end-to-end speech recognition, and end-to-end text-to-speech. ESPnet uses chainer and pytorch as a main deep learning engine, and also follows Kaldi style data processing, feature extraction/format, and recipes to provide a complete setup for speech recognition and other speech processing experiments. To use cuda (and cudnn), make sure to set paths in your .bashrc or .bash_profile appropriately.

stt-benchmark - speech to text benchmark framework

  •    Python

This is a minimalist and extensible framework for benchmarking different speech-to-text engines. It has been developed and tested on Ubuntu 18.04 with Python3.6. This framework has been developed by Picovoice as part of the project Cheetah. Cheetah is Picovoice's speech-to-text engine specifically designed for IoT applications. Deep learning has been the main driver in recent improvements in speech recognition. But due to stringent compute/storage limitations of IoT platforms it is most beneficial to the cloud-based engines. Picovoice's proprietary deep learning technology enables transferring these improvements to IoT platforms with much lower CPU/memory footprint. The goal is to be able to run Cheetah on any platform with a C Compiler and a few MB of memory.


tensorflow-speech-recognition - 🎙Speech recognition using the tensorflow deep learning framework, sequence-to-sequence neural networks

  •    Python

Speech recognition using google's tensorflow deep learning framework, sequence-to-sequence neural networks. Replaces caffe-speech-recognition, see there for some background.

voice-elements - :speaker: Web Component wrapper to the Web Speech API, that allows you to do voice recognition and speech synthesis using Polymer

  •    HTML

Web Component wrapper to the Web Speech API, that allows you to do voice recognition (speech to text) and speech synthesis (text to speech) using Polymer. Or download as ZIP.

sonus - :speech_balloon: /so.nus/ STT (speech to text) for Node with offline hotword detection

  •    Javascript

Sonus lets you quickly and easily add a VUI (Voice User Interface) to any hardware or software project. Just like Alexa, Google Now, and Siri, Sonus is always listening offline for a customizable hotword. Once that hotword is detected your speech is streamed to the cloud recognition service of your choice - then you get the results. Generally, running npm install should suffice. This module however, requires you to install SoX.

SpeakRight Framework - Helps to build Speech Recognition Applications

  •    Java

SpeakRight is an Java framework for writing speech recognition applications in VoiceXML. Dynamic generation of VoiceXML is done using the popular StringTemplate templating framework. Although VoiceXML uses a similar web architecture as HTML, the needs of a speech app are very different. SpeakRight lives in application code layer, typically in a servlet. The SpeakRight runtime dynamically generates VoiceXML pages, one per HTTP request.

eSpeak - Text to Speech

  •    C

eSpeak is a compact open source software speech synthesizer for English and other languages. eSpeak uses a formant synthesis method. This allows many languages to be provided in a small size. It supports SAPI5 version for Windows, so it can be used with screen-readers and other programs that support the Windows SAPI5 interface. It can translate text into phoneme codes, so it could be adapted as a front end for another speech synthesis engine.

Festival - Speech Synthesis System

  •    C++

Festival offers a general framework for building speech synthesis systems as well as including examples of various modules. It offers full text to speech through a APIs via shell and though a Scheme command interpreter. It has native support for Apple OS. It supports English and Spanish languages.

MARY - Text-to-Speech System

  •    Java

MARY is an open-source, multilingual Text-to-Speech Synthesis platform written in Java. It supports German, British and American English, Telugu, Turkish, and Russian.

FreeTTS - Speech Synthesizer in Java

  •    Java

FreeTTS is a speech synthesis system written entirely in the Java. It is based upon Flite, a small run-time speech synthesis engine developed at Carnegie Mellon University. Flite is derived from the Festival Speech Synthesis System from the University of Edinburgh and the FestVox project from Carnegie Mellon University. FreeTTS supports a subset of the JSAPI 1.0 java speech synthesis specification.

SpeechKITT - 🗣 A flexible GUI for Speech Recognition

  •    Javascript

Speech KITT makes it easy to add a GUI to sites using Speech Recognition. Whether you are using annyang, a different library or webkitSpeechRecognition directly, KITT will take care of the GUI. Speech KITT provides a graphical interface for the user to start or stop Speech Recognition and see its current status. It can also help guide the user on how to interact with your site using their voice, providing instructions and sample commands. It can even be used to carry a natural conversation with the user, asking questions the user can answer with his voice, and then asking follow up questions.

annyang - :speech_balloon: Speech recognition for your site

  •    Javascript

A tiny javascript SpeechRecognition library that lets your users control your site with voice commands. annyang has no dependencies, weighs just 2 KB, and is free to use and modify under the MIT license.

speech_recognition - Speech recognition module for Python, supporting several engines and APIs, online and offline

  •    Python

Library for performing speech recognition, with support for several engines and APIs, online and offline. Quickstart: pip install SpeechRecognition. See the "Installing" section for more details.

DeepSpeech - A TensorFlow implementation of Baidu's DeepSpeech architecture

  •    C

Project DeepSpeech is an open source Speech-To-Text engine. It uses a model trained by machine learning techniques, based on Baidu's Deep Speech research paper. Project DeepSpeech uses Google's TensorFlow project to make the implementation easier.

Flite - Fast Run time Synthesis Engine

  •    C

Flite (festival-lite) is a small, fast run-time synthesis engine developed at CMU and primarily designed for small embedded machines and/or large servers. Flite is designed as an alternative synthesis engine to Festival for voices built using the FestVox suite of voice building tools.

Google Speech Recognition Example

  •    

Google Speech Recognition contains a working example of application that uses google speech recognition API. App contains all necessary dlls to record, decode and send your voice request to google service and recieve a text representation of what you've said. It's developed i...

OpenSeq2Seq - Toolkit for efficient experimentation with various sequence-to-sequence models

  •    Python

This is a research project, not an official NVIDIA product. OpenSeq2Seq main goal is to allow researchers to most effectively explore various sequence-to-sequence models. The efficiency is achieved by fully supporting distributed and mixed-precision training. OpenSeq2Seq is built using TensorFlow and provides all the necessary building blocks for training encoder-decoder models for neural machine translation and automatic speech recognition. We plan to extend it with other modalities in the future.