py-kaldi-asr - Some simple wrappers around kaldi-asr intended to make using kaldi's (online) decoders as convenient as possible

  •        132

Some simple wrappers around kaldi-asr intended to make using kaldi's online nnet3-chain decoders as convenient as possible. Target audience are developers who would like to use kaldi-asr as-is for speech recognition in their application on GNU/Linux operating systems.

https://github.com/gooofy/py-kaldi-asr

Tags
Implementation
License
Platform

   




Related Projects

kaldi-gstreamer-server - Real-time full-duplex speech recognition server, based on the Kaldi toolkit and the GStreamer framwork

  •    Python

This is a real-time full-duplex speech recognition server, based on the Kaldi toolkit and the GStreamer framework and implemented in Python. 2018-04-25: Server should now work with Tornado 5 (thanks to @Gastron). If using Python 2, you might need to install the futures package (pip install futures).

espnet - End-to-End Speech Processing Toolkit

  •    Shell

ESPnet is an end-to-end speech processing toolkit, mainly focuses on end-to-end speech recognition, and end-to-end text-to-speech. ESPnet uses chainer and pytorch as a main deep learning engine, and also follows Kaldi style data processing, feature extraction/format, and recipes to provide a complete setup for speech recognition and other speech processing experiments. To use cuda (and cudnn), make sure to set paths in your .bashrc or .bash_profile appropriately.

DeepSpeech - A PaddlePaddle implementation of DeepSpeech2 architecture for ASR.

  •    Python

DeepSpeech2 on PaddlePaddle is an open-source implementation of end-to-end Automatic Speech Recognition (ASR) engine, based on Baidu's Deep Speech 2 paper, with PaddlePaddle platform. Our vision is to empower both industrial application and academic research on speech recognition, via an easy-to-use, efficient and scalable implementation, including training, inference & testing module, distributed PaddleCloud training, and demo deployment. Besides, several pre-trained models for both English and Mandarin are also released. To avoid the trouble of environment setup, running in Docker container is highly recommended. Otherwise follow the guidelines below to install the dependencies manually.

Kaldi - Speech Recognition Toolkit

  •    C++

Kaldi is a Speech recognition research toolkit. It is similar in aims and scope to HTK. The goal is to have modern and flexible code, written in C++, that is easy to modify and extend.

ASR-Builder

  •    Python

ASR-Builder provides an easy-to-use interface to the HTK toolkit, that allows users to build ASR systems. ASR-Builder provides a platform that performs house-keeping tasks when using HTK and also provides default training/testing/recognition scripts.


delta - DELTA is a deep learning based natural language and speech processing platform.

  •    Python

DELTA is a deep learning based end-to-end natural language and speech processing platform. DELTA aims to provide easy and fast experiences for using, deploying, and developing natural language processing and speech models for both academia and industry use cases. DELTA is mainly implemented using TensorFlow and Python 3. For details of DELTA, please refer to this paper.

Speech Server .NET

  •    CSharp

Speech Server .NET aims to add functionalities of Text-To-Speech (TTS) and Automatic Speech Recnognition (ASR) to handheld devices like Pocket PC and Smartphone, running Windows Mobile, that are wirelessly connected to a server. This server is able to generate a speech stream ...

wav2letter - Facebook AI Research Automatic Speech Recognition Toolkit

  •    Lua

wav2letter is a simple and efficient end-to-end Automatic Speech Recognition (ASR) system from Facebook AI Research. The original authors of this implementation are Ronan Collobert, Christian Puhrsch, Gabriel Synnaeve, Neil Zeghidour, and Vitaliy Liptchinsky. wav2letter implements the architecture proposed in Wav2Letter: an End-to-End ConvNet-based Speech Recognition System and Letter-Based Speech Recognition with Gated ConvNets.

ASR Setup Tool [318]

  •    

Developed by 318 Inc., ASR Setup Tool is an application for setting up Apple Software Restore (quot;ASRquot;). In the context of the ASR Setup Tool, ASR is used for setting up a multicast stream that can then be leveraged for imaging Mac OS X computers.

Alibaba-MIT-Speech - Alibaba speech technology

  •    

This is a PATCH file with the DFSMN related codes and example scripts for LibriSpeech task. The patch is built based on the Kaldi speech recognition toolkit with commit "04b1f7d6658bc035df93d53cb424edc127fab819".

visionworkbench - The NASA Vision Workbench is a general purpose image processing and computer vision library developed by the Autonomous Systems and Robotics (ASR) Area in the Intelligent Systems Division at the NASA Ames Research Center

  •    C++

The NASA Vision Workbench is a general purpose image processing and computer vision library developed by the Autonomous Systems and Robotics (ASR) Area in the Intelligent Systems Division at the NASA Ames Research Center.

loki

  •    Java

Speech workbench (asr, speech processing, toolkits)

faster-rnnlm - Faster Recurrent Neural Network Language Modeling Toolkit with Noise Contrastive Estimation and Hierarchical Softmax

  •    C++

In a nutshell, the goal of this project is to create an rnnlm implementation that can be trained on huge datasets (several billions of words) and very large vocabularies (several hundred thousands) and used in real-world ASR and MT problems. Besides, to achieve better results this implementation supports such praised setups as ReLU+DiagonalInitialization [1], GRU [2], NCE [3], and RMSProp [4]. How fast is it? Well, on One Billion Word Benchmark [8] and 3.3GHz CPU the program with standard parameters (sigmoid hidden layer of size 256 and hierarchical softmax) processes more then 250k words per second in 8 threads, i.e. 15 millions of words per minute. As a result an epoch takes less than one hour. Check Experiments section for more numbers and figures.

AutoDMG - Create deployable system images from OS X installer

  •    Python

The award winning AutoDMG takes a macOS installer (10.10 or newer) and builds a system image suitable for deployment with Imagr, DeployStudio, LANrev, Jamf Pro, and other asr-based imaging tools. Documentation and help is in the AutoDMG wiki.

Hindi ASR

  •    C++

Acoustic model developed using acoustic data recorded by native Hindi speakers.

openpls

  •    Java

Open and interoperable Platform and/or API for specification of pronunciation of w3c voice browser working group PLS draft http://www.w3.org/TR/pronunciation-lexicon/ which are external and additional to default TTS/ASR existing lexicon.

Zanzibar Open IVR

  •    Java

Zanzibar is a complete, standards based IVR. It includes an MRCPv2 Server with ASR and TTS engines as well as an voiceXML interpreter so that you can deploy and run voiceXML applications. It integrates with VOIP PBX’s (like Asterisk) using SIP and RTP.

voice-elements - :speaker: Web Component wrapper to the Web Speech API, that allows you to do voice recognition and speech synthesis using Polymer

  •    HTML

Web Component wrapper to the Web Speech API, that allows you to do voice recognition (speech to text) and speech synthesis (text to speech) using Polymer. Or download as ZIP.

p5.speech - Web Audio Speech Synthesis / Recognition for p5.js

  •    Javascript

p5.speech is a JavaScript library that provides simple, clear access to the Web Speech and Speech Recognition APIs, allowing for the easy creation of sketches that can talk and listen. It consists of two object classes (p5.Speech and p5.SpeechRec) along with accessor functions to speak and listen for text, change parameters (synthesis voices, recognition models, etc.), and retrieve callbacks from the system. Speech recognition requires launching from a server (e.g. a python simpleserver on a local machine).